A commercial success movie not only entertains audience, but also enables film companies to gain tremendous profit. A lot of factors such as good directors, experienced actors are considerable for creating good movies. However, famous directors and actors can always bring an expected box-office income but cannot guarantee a highly rated imdb score.
The dataset is from Kaggle website. It contains 28 variables for 5043 movies, spanning across 100 years in 66 countries. There are 2399 unique director names, and thousands of actors/actresses. “imdb_score” is the response variable while the other 27 variables are possible predictors.
| Variable Name | Description |
|---|---|
| movie_title | Title of the Movie |
| duration | Duration in minutes |
| director_name | Name of the Director of the Movie |
| director_facebook_likes | Number of likes of the Director on his Facebook Page |
| actor_1_name | Primary actor starring in the movie |
| actor_1_facebook_likes | Number of likes of the Actor_1 on his/her Facebook Page |
| actor_2_name | Other actor starring in the movie |
| actor_2_facebook_likes | Number of likes of the Actor_2 on his/her Facebook Page |
| actor_3_name | Other actor starring in the movie |
| actor_3_facebook_likes | Number of likes of the Actor_3 on his/her Facebook Page |
| num_user_for_reviews | Number of users who gave a review |
| num_critic_for_reviews | Number of critical reviews on imdb |
| num_voted_users | Number of people who voted for the movie |
| cast_total_facebook_likes | Total number of facebook likes of the entire cast of the movie |
| movie_facebook_likes | Number of Facebook likes in the movie page |
| plot_keywords | Keywords describing the movie plot |
| facenumber_in_poster | Number of the actor who featured in the movie poster |
| color | Film colorization. ‘Black and White’ or ‘Color’ |
| genres | Film categorization like ‘Animation’, ‘Comedy’, ‘Romance’, ‘Horror’, ‘Sci-Fi’, ‘Action’, ‘Family’ |
| title_year | The year in which the movie is released (1916:2016) |
| language | English, Arabic, Chinese, French, German, Danish, Italian, Japanese etc |
| country | Country where the movie is produced |
| content_rating | Content rating of the movie |
| aspect_ratio | Aspect ratio the movie was made in |
| movie_imdb_link | IMDB link of the movie |
| gross | Gross earnings of the movie in Dollars |
| budget | Budget of the movie in Dollars |
| imdb_score | IMDB Score of the movie on IMDB |
Based on the massive movie information, it would be interesting to understand what are the important factors that make a movie more successful than others. So, we would like to analyze what kind of movies are more successful, in other words, get higher IMDB score.
In this notebook we are going to build two different kind of models, Regression and Classification. Under each kind of model we are going to start from a basic model to advanced model and also a description of why we choose advanced one.
Under Regression we are goint to fit Regression line to our data and find the continous target variable imdb_score.
Under Classification we are going to fit the Classification Model to our data and the Classify the imdb_score in to three categories.
| imdb_score | Classify |
|---|---|
| 1-3 | Flop Movie |
| 3-6 | Average Movie |
| 6-10 | Hit Movie |
#importing the libraries that we use
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas_profiling as pp
#importing the dataset
dataset = pd.read_csv('../input/imdb-5000-movie-dataset/movie_metadata.csv')
dataset.head()
| color | director_name | num_critic_for_reviews | duration | director_facebook_likes | actor_3_facebook_likes | actor_2_name | actor_1_facebook_likes | gross | genres | ... | num_user_for_reviews | language | country | content_rating | budget | title_year | actor_2_facebook_likes | imdb_score | aspect_ratio | movie_facebook_likes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Color | James Cameron | 723.0 | 178.0 | 0.0 | 855.0 | Joel David Moore | 1000.0 | 760505847.0 | Action|Adventure|Fantasy|Sci-Fi | ... | 3054.0 | English | USA | PG-13 | 237000000.0 | 2009.0 | 936.0 | 7.9 | 1.78 | 33000 |
| 1 | Color | Gore Verbinski | 302.0 | 169.0 | 563.0 | 1000.0 | Orlando Bloom | 40000.0 | 309404152.0 | Action|Adventure|Fantasy | ... | 1238.0 | English | USA | PG-13 | 300000000.0 | 2007.0 | 5000.0 | 7.1 | 2.35 | 0 |
| 2 | Color | Sam Mendes | 602.0 | 148.0 | 0.0 | 161.0 | Rory Kinnear | 11000.0 | 200074175.0 | Action|Adventure|Thriller | ... | 994.0 | English | UK | PG-13 | 245000000.0 | 2015.0 | 393.0 | 6.8 | 2.35 | 85000 |
| 3 | Color | Christopher Nolan | 813.0 | 164.0 | 22000.0 | 23000.0 | Christian Bale | 27000.0 | 448130642.0 | Action|Thriller | ... | 2701.0 | English | USA | PG-13 | 250000000.0 | 2012.0 | 23000.0 | 8.5 | 2.35 | 164000 |
| 4 | NaN | Doug Walker | NaN | NaN | 131.0 | NaN | Rob Walker | 131.0 | NaN | Documentary | ... | NaN | NaN | NaN | NaN | NaN | NaN | 12.0 | 7.1 | NaN | 0 |
5 rows × 28 columns
dataset.shape
(5043, 28)
dataset.columns
Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
dtype='object')
dataset.profile_report()
dataset.drop_duplicates(inplace = True)
dataset.shape
(4998, 28)
Data Cleaning is a most important part of building a model. Here we do the standard preprocessing steps of the Data cleaning to make sure our model is not feeded crap.
numerical_cols = [col for col in dataset.columns if dataset[col].dtype != 'object']
categorical_cols = [col for col in dataset.columns if dataset[col].dtype == 'object']
categorical_cols, numerical_cols
(['color', 'director_name', 'actor_2_name', 'genres', 'actor_1_name', 'movie_title', 'actor_3_name', 'plot_keywords', 'movie_imdb_link', 'language', 'country', 'content_rating'], ['num_critic_for_reviews', 'duration', 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_1_facebook_likes', 'gross', 'num_voted_users', 'cast_total_facebook_likes', 'facenumber_in_poster', 'num_user_for_reviews', 'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'])
dataset[numerical_cols].describe()
| num_critic_for_reviews | duration | director_facebook_likes | actor_3_facebook_likes | actor_1_facebook_likes | gross | num_voted_users | cast_total_facebook_likes | facenumber_in_poster | num_user_for_reviews | budget | title_year | actor_2_facebook_likes | imdb_score | aspect_ratio | movie_facebook_likes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 4949.000000 | 4983.000000 | 4895.000000 | 4975.000000 | 4991.000000 | 4.124000e+03 | 4.998000e+03 | 4998.000000 | 4985.000000 | 4977.000000 | 4.511000e+03 | 4891.000000 | 4985.000000 | 4998.000000 | 4671.000000 | 4998.000000 |
| mean | 139.890079 | 107.213325 | 688.679060 | 639.900905 | 6556.939892 | 4.832565e+07 | 8.347020e+04 | 9676.941176 | 1.368907 | 272.014667 | 3.974787e+07 | 2002.468820 | 1642.998796 | 6.441056 | 2.221417 | 7487.430172 |
| std | 121.477586 | 25.248775 | 2821.649616 | 1643.298282 | 15061.586700 | 6.796483e+07 | 1.380866e+05 | 18165.404578 | 2.014623 | 377.776210 | 2.069689e+08 | 12.475235 | 4030.925303 | 1.124107 | 1.391185 | 19290.726563 |
| min | 1.000000 | 7.000000 | 0.000000 | 0.000000 | 0.000000 | 1.620000e+02 | 5.000000e+00 | 0.000000 | 0.000000 | 1.000000 | 2.180000e+02 | 1916.000000 | 0.000000 | 1.600000 | 1.180000 | 0.000000 |
| 25% | 50.000000 | 93.000000 | 7.000000 | 133.000000 | 611.500000 | 5.304835e+06 | 8.560000e+03 | 1405.500000 | 0.000000 | 64.000000 | 6.000000e+06 | 1999.000000 | 280.000000 | 5.800000 | 1.850000 | 0.000000 |
| 50% | 110.000000 | 103.000000 | 49.000000 | 369.000000 | 984.000000 | 2.544575e+07 | 3.426050e+04 | 3085.500000 | 1.000000 | 156.000000 | 2.000000e+07 | 2005.000000 | 595.000000 | 6.600000 | 2.350000 | 162.500000 |
| 75% | 195.000000 | 118.000000 | 192.000000 | 635.000000 | 11000.000000 | 6.231942e+07 | 9.612075e+04 | 13740.500000 | 2.000000 | 324.000000 | 4.500000e+07 | 2011.000000 | 917.000000 | 7.200000 | 2.350000 | 3000.000000 |
| max | 813.000000 | 511.000000 | 23000.000000 | 23000.000000 | 640000.000000 | 7.605058e+08 | 1.689764e+06 | 656730.000000 | 43.000000 | 5060.000000 | 1.221550e+10 | 2016.000000 | 137000.000000 | 9.500000 | 16.000000 | 349000.000000 |
dataset[categorical_cols].describe()
| color | director_name | actor_2_name | genres | actor_1_name | movie_title | actor_3_name | plot_keywords | movie_imdb_link | language | country | content_rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 4979 | 4895 | 4985 | 4998 | 4991 | 4998 | 4975 | 4846 | 4998 | 4986 | 4993 | 4697 |
| unique | 2 | 2398 | 3032 | 914 | 2097 | 4917 | 3521 | 4760 | 4919 | 47 | 65 | 18 |
| top | Color | Steven Spielberg | Morgan Freeman | Drama | Robert De Niro | Ben-Hur | Ben Mendelsohn | based on novel | http://www.imdb.com/title/tt2638144/?ref_=fn_t... | English | USA | R |
| freq | 4772 | 26 | 20 | 235 | 49 | 3 | 8 | 4 | 3 | 4662 | 3773 | 2098 |
dataset.isnull().sum()
color 19 director_name 103 num_critic_for_reviews 49 duration 15 director_facebook_likes 103 actor_3_facebook_likes 23 actor_2_name 13 actor_1_facebook_likes 7 gross 874 genres 0 actor_1_name 7 movie_title 0 num_voted_users 0 cast_total_facebook_likes 0 actor_3_name 23 facenumber_in_poster 13 plot_keywords 152 movie_imdb_link 0 num_user_for_reviews 21 language 12 country 5 content_rating 301 budget 487 title_year 107 actor_2_facebook_likes 13 imdb_score 0 aspect_ratio 327 movie_facebook_likes 0 dtype: int64
dataset.color.unique()
array(['Color', nan, ' Black and White'], dtype=object)
color_mode = dataset['color'].mode().iloc[0]
dataset.color.fillna(color_mode, inplace = True)
dataset.color.isnull().sum()
0
dataset.director_name.nunique(), dataset.director_name.isnull().sum()
(2398, 103)
dataset = dataset.dropna(axis = 0, subset = ['director_name'] )
dataset.num_critic_for_reviews.min(), dataset.num_critic_for_reviews.max(), dataset.num_critic_for_reviews.median()
(1.0, 813.0, 112.0)
num_critic_for_reviews_median = dataset['num_critic_for_reviews'].median()
dataset.num_critic_for_reviews.fillna(num_critic_for_reviews_median, inplace = True)
dataset.num_critic_for_reviews.isnull().sum()
0
dataset.duration.min(), dataset.duration.max(), dataset.duration.median()
(7.0, 330.0, 104.0)
duration_median = dataset.duration.median()
dataset.duration.fillna(duration_median, inplace = True)
dataset.duration.isnull().sum()
0
dataset.director_facebook_likes.min(), dataset.director_facebook_likes.max(), dataset.director_facebook_likes.median(),dataset.director_facebook_likes.mean()
(0.0, 23000.0, 49.0, 688.6790602655772)
director_facebook_likes_mean = dataset.director_facebook_likes.mean()
dataset.director_facebook_likes.fillna(director_facebook_likes_mean, inplace = True)
dataset.director_facebook_likes.isnull().sum()
0
dataset.actor_3_facebook_likes.min(), dataset.actor_3_facebook_likes.max(), dataset.actor_3_facebook_likes.median(),dataset.actor_3_facebook_likes.mean()
(0.0, 23000.0, 372.0, 646.1009230769231)
actor_3_facebook_likes_mean = dataset.actor_3_facebook_likes.mean()
dataset.actor_3_facebook_likes.fillna(actor_3_facebook_likes_mean, inplace = True)
dataset.actor_3_facebook_likes.isnull().sum()
0
dataset = dataset.dropna(axis = 0, subset = ['actor_2_name'])
dataset.actor_2_name.isnull().sum()
0
dataset.actor_1_facebook_likes.min(), dataset.actor_1_facebook_likes.max(), dataset.actor_1_facebook_likes.median(),dataset.actor_1_facebook_likes.mean()
(0.0, 640000.0, 991.0, 6670.408886158886)
actor_1_facebook_likes_mean = dataset.actor_1_facebook_likes.mean()
dataset.actor_1_facebook_likes.fillna(actor_1_facebook_likes_mean, inplace = True)
dataset.actor_1_facebook_likes.isnull().sum()
0
dataset.gross.describe()
count 4.115000e+03 mean 4.842949e+07 std 6.800274e+07 min 1.620000e+02 25% 5.354708e+06 50% 2.551750e+07 75% 6.242729e+07 max 7.605058e+08 Name: gross, dtype: float64
dataset.gross.isnull().sum()
769
dataset = dataset.dropna(axis = 0, subset = ['gross'])
dataset.gross.isnull().sum()
0
dataset.shape
(4115, 28)
dataset.isnull().sum()
color 0 director_name 0 num_critic_for_reviews 0 duration 0 director_facebook_likes 0 actor_3_facebook_likes 0 actor_2_name 0 actor_1_facebook_likes 0 gross 0 genres 0 actor_1_name 0 movie_title 0 num_voted_users 0 cast_total_facebook_likes 0 actor_3_name 7 facenumber_in_poster 7 plot_keywords 39 movie_imdb_link 0 num_user_for_reviews 1 language 3 country 0 content_rating 60 budget 263 title_year 0 actor_2_facebook_likes 0 imdb_score 0 aspect_ratio 102 movie_facebook_likes 0 dtype: int64
dataset = dataset.dropna(axis = 0, subset = ['budget'])
dataset.budget.isnull().sum()
0
dataset.isnull().sum()
color 0 director_name 0 num_critic_for_reviews 0 duration 0 director_facebook_likes 0 actor_3_facebook_likes 0 actor_2_name 0 actor_1_facebook_likes 0 gross 0 genres 0 actor_1_name 0 movie_title 0 num_voted_users 0 cast_total_facebook_likes 0 actor_3_name 5 facenumber_in_poster 6 plot_keywords 30 movie_imdb_link 0 num_user_for_reviews 0 language 3 country 0 content_rating 48 budget 0 title_year 0 actor_2_facebook_likes 0 imdb_score 0 aspect_ratio 72 movie_facebook_likes 0 dtype: int64
dataset.shape
(3852, 28)
dataset = dataset.dropna(axis = 0, subset = ['actor_3_name'])
dataset.actor_3_name.isnull().sum()
0
facenumber_in_poster_median = dataset.facenumber_in_poster.median()
dataset.facenumber_in_poster.fillna(facenumber_in_poster_median, inplace = True)
dataset.facenumber_in_poster.isnull().sum()
0
dataset.plot_keywords.unique()
array(['avatar|future|marine|native|paraplegic',
'goddess|marriage ceremony|marriage proposal|pirate|singapore',
'bomb|espionage|sequel|spy|terrorist', ...,
'assassin|death|guitar|gun|mariachi',
'written and directed by cast member',
'actress name in title|crush|date|four word title|video camera'],
dtype=object)
dataset.language.unique()
array(['English', 'Mandarin', 'Aboriginal', 'Spanish', 'French',
'Filipino', 'Maya', 'Kazakh', 'Telugu', 'Cantonese', 'Japanese',
'Aramaic', 'Italian', 'Dutch', 'Dari', 'German', 'Mongolian',
'Thai', 'Bosnian', 'Korean', 'Hungarian', 'Hindi', nan,
'Icelandic', 'Danish', 'Portuguese', 'Norwegian', 'Czech',
'Russian', 'None', 'Zulu', 'Hebrew', 'Dzongkha', 'Arabic',
'Vietnamese', 'Indonesian', 'Romanian', 'Persian', 'Swedish'],
dtype=object)
dataset.language.value_counts()
English 3665 French 37 Spanish 26 Mandarin 14 German 13 Japanese 12 Hindi 10 Cantonese 8 Italian 7 Korean 5 Portuguese 5 Norwegian 4 Danish 3 Thai 3 Persian 3 Dutch 3 Aboriginal 2 Hebrew 2 Indonesian 2 Dari 2 Vietnamese 1 Dzongkha 1 Bosnian 1 Arabic 1 Kazakh 1 Mongolian 1 Romanian 1 Swedish 1 Filipino 1 Aramaic 1 Czech 1 Telugu 1 Hungarian 1 Icelandic 1 Maya 1 Zulu 1 Russian 1 None 1 Name: language, dtype: int64
language_mode = dataset.language.mode().iloc[0]
dataset.language.fillna(language_mode, inplace = True)
dataset.language.isnull().sum()
0
dataset = dataset.dropna(axis = 0, subset = ['plot_keywords'])
dataset.plot_keywords.isnull().sum()
0
dataset.content_rating.unique()
array(['PG-13', 'PG', 'G', 'R', 'Approved', 'NC-17', nan, 'X',
'Not Rated', 'Unrated', 'M', 'GP', 'Passed'], dtype=object)
dataset.content_rating.fillna('Not Rated', inplace = True)
dataset.aspect_ratio.unique()
array([ 1.78, 2.35, 1.85, 2. , 2.2 , 2.39, 2.24, 1.66, 1.5 ,
1.77, 2.4 , 1.37, nan, 2.76, 1.33, 1.18, 2.55, 1.75,
16. ])
aspect_ratio_mode = dataset.aspect_ratio.mode().iloc[0]
dataset.aspect_ratio.fillna(aspect_ratio_mode, inplace = True)
dataset.isnull().sum()
color 0 director_name 0 num_critic_for_reviews 0 duration 0 director_facebook_likes 0 actor_3_facebook_likes 0 actor_2_name 0 actor_1_facebook_likes 0 gross 0 genres 0 actor_1_name 0 movie_title 0 num_voted_users 0 cast_total_facebook_likes 0 actor_3_name 0 facenumber_in_poster 0 plot_keywords 0 movie_imdb_link 0 num_user_for_reviews 0 language 0 country 0 content_rating 0 budget 0 title_year 0 actor_2_facebook_likes 0 imdb_score 0 aspect_ratio 0 movie_facebook_likes 0 dtype: int64
dataset.reset_index(inplace = True, drop = True)
dataset.profile_report()
Dealing with Null Data amount we have lost 25% of the given data. Let's deal with converting the Data in to numericals to feed our model.
numerical_cols, categorical_cols
(['num_critic_for_reviews', 'duration', 'director_facebook_likes', 'actor_3_facebook_likes', 'actor_1_facebook_likes', 'gross', 'num_voted_users', 'cast_total_facebook_likes', 'facenumber_in_poster', 'num_user_for_reviews', 'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score', 'aspect_ratio', 'movie_facebook_likes'], ['color', 'director_name', 'actor_2_name', 'genres', 'actor_1_name', 'movie_title', 'actor_3_name', 'plot_keywords', 'movie_imdb_link', 'language', 'country', 'content_rating'])
Let us deal with the categorical_cols first by converting them in to numericals.
dataset.color.unique(), dataset.color.nunique()
(array(['Color', ' Black and White'], dtype=object), 2)
So as we see there are only 2 different categorical variables available in the color variable. We can just map color to 1 and 0 to black and white
dataset['color'] = dataset.color.map({'Color' : 1 , ' Black and White' : 0})
dataset.director_name.unique(), dataset.director_name.nunique()
(array(['James Cameron', 'Gore Verbinski', 'Sam Mendes', ...,
'Kiyoshi Kurosawa', 'Shane Carruth', 'Neill Dela Llana'],
dtype=object), 1723)
director_name_value_counts = dataset.director_name.value_counts()
director_name_value_counts = pd.DataFrame(director_name_value_counts).reset_index().rename(columns = {'index': 'director_name', 'director_name':'director_name_value_counts'})
dataset = pd.merge(dataset, director_name_value_counts,left_on = 'director_name', right_on = 'director_name', how = 'left')
dataset = dataset.drop(columns = 'director_name')
dataset.actor_2_name.unique(), dataset.actor_2_name.nunique()
(array(['Joel David Moore', 'Orlando Bloom', 'Rory Kinnear', ...,
'Peter Marquardt', 'Caitlin FitzGerald', 'Brian Herzlinger'],
dtype=object), 2259)
actor_2_name_value_counts = dataset.actor_2_name.value_counts()
actor_2_name_value_counts = pd.DataFrame(actor_2_name_value_counts).reset_index().rename(columns = {'index': 'actor_2_name', 'actor_2_name':'actor_2_name_value_counts'})
dataset = pd.merge(dataset, actor_2_name_value_counts,left_on = 'actor_2_name', right_on = 'actor_2_name', how = 'left')
dataset = dataset.drop(columns = 'actor_2_name')
dataset.genres.unique(), dataset.genres.nunique()
(array(['Action|Adventure|Fantasy|Sci-Fi', 'Action|Adventure|Fantasy',
'Action|Adventure|Thriller', 'Action|Thriller',
'Action|Adventure|Sci-Fi', 'Action|Adventure|Romance',
'Adventure|Animation|Comedy|Family|Fantasy|Musical|Romance',
'Adventure|Family|Fantasy|Mystery', 'Action|Adventure',
'Action|Adventure|Western', 'Action|Adventure|Family|Fantasy',
'Action|Adventure|Comedy|Family|Fantasy|Sci-Fi',
'Adventure|Fantasy', 'Action|Adventure|Drama|History',
'Adventure|Family|Fantasy', 'Action|Adventure|Drama|Romance',
'Drama|Romance', 'Action|Adventure|Sci-Fi|Thriller',
'Action|Adventure|Fantasy|Romance',
'Action|Adventure|Fantasy|Sci-Fi|Thriller',
'Adventure|Animation|Comedy|Family|Fantasy',
'Adventure|Animation|Comedy|Family|Sport', 'Action|Crime|Thriller',
'Action|Adventure|Horror|Sci-Fi|Thriller',
'Adventure|Animation|Family|Sci-Fi',
'Action|Comedy|Crime|Thriller', 'Animation|Drama|Family|Fantasy',
'Action|Crime|Drama|Thriller', 'Adventure|Animation|Comedy|Family',
'Action|Adventure|Animation|Comedy|Family|Sci-Fi',
'Adventure|Drama|Family|Mystery', 'Action|Comedy|Sci-Fi|Western',
'Action|Adventure|Fantasy|Horror|Thriller',
'Action|Adventure|Comedy|Sci-Fi', 'Comedy|Family|Fantasy',
'Adventure|Animation|Comedy|Drama|Family|Fantasy',
'Adventure|Drama|Family|Fantasy', 'Action|Adventure|Drama|Fantasy',
'Action|Adventure|Family|Fantasy|Romance',
'Action|Adventure|Drama|Sci-Fi',
'Action|Adventure|Family|Mystery|Sci-Fi',
'Action|Adventure|Animation|Comedy|Drama|Family|Sci-Fi',
'Adventure|Animation|Comedy|Family|Sci-Fi',
'Adventure|Animation|Family|Fantasy', 'Action|Sci-Fi',
'Adventure|Drama|Sci-Fi', 'Drama|Fantasy|Romance',
'Adventure|Sci-Fi', 'Action|Adventure|Drama|Thriller',
'Action|Drama|History|Romance|War',
'Action|Adventure|Biography|Drama|History|Romance|War',
'Action|Drama', 'Drama|Horror|Sci-Fi',
'Adventure|Comedy|Family|Fantasy',
'Animation|Comedy|Family|Fantasy',
'Action|Adventure|Animation|Comedy|Family',
'Adventure|Animation|Comedy|Family|Fantasy|Musical',
'Mystery|Thriller', 'Adventure|Animation|Comedy|Drama|Family',
'Action|Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi',
'Comedy|Fantasy|Horror', 'Drama|Fantasy|Horror|Thriller',
'Action|Drama|Thriller', 'Adventure',
'Action|Comedy|Fantasy|Sci-Fi',
'Action|Adventure|Comedy|Family|Fantasy|Mystery|Sci-Fi',
'Action|Adventure|Animation|Fantasy', 'Comedy|Crime',
'Action|Drama|History|War', 'Action|Adventure|Drama',
'Action|Adventure|Animation|Comedy|Family|Fantasy',
'Action|Drama|Mystery|Sci-Fi', 'Action|Adventure|Comedy|Thriller',
'Action|Adventure|Animation|Fantasy|Romance|Sci-Fi',
'Action|Adventure|Drama|History|War',
'Adventure|Drama|Fantasy|Romance',
'Animation|Comedy|Family|Musical',
'Adventure|Drama|Thriller|Western',
'Adventure|Animation|Comedy|Family|Western',
'Action|Mystery|Thriller', 'Adventure|Sci-Fi|Thriller',
'Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi',
'Action|Crime|Mystery|Thriller', 'Action|Adventure|Family|Mystery',
'Adventure|Drama|Romance|War',
'Adventure|Animation|Family|Thriller',
'Action|Animation|Comedy|Family|Sci-Fi', 'Action|Comedy|Fantasy',
'Adventure|Animation|Comedy|Family|Musical',
'Action|Adventure|Crime|Mystery|Thriller',
'Action|Adventure|History', 'Action', 'Adventure|Drama|Fantasy',
'Action|Fantasy|Thriller', 'Action|Adventure|Comedy|Crime',
'Adventure|Mystery|Sci-Fi', 'Action|Drama|Sci-Fi|Thriller',
'Action|Crime|Sci-Fi|Thriller', 'Action|Family|Sport',
'Comedy|Drama|Romance', 'Action|Comedy|Romance',
'Action|Adventure|Mystery|Sci-Fi', 'Action|Drama|War',
'Adventure|Drama|Sci-Fi|Thriller',
'Action|Adventure|Comedy|Family|Fantasy', 'Crime|Thriller',
'Action|Comedy|Crime|Romance|Thriller', 'Biography|Drama',
'Action|Comedy|Crime|Sci-Fi|Thriller', 'Action|Drama|Fantasy|War',
'Animation|Comedy|Family|Music|Western',
'Action|Adventure|Mystery|Sci-Fi|Thriller',
'Action|Drama|Sci-Fi|Sport', 'Action|Crime|Romance|Thriller',
'Action|Adventure|Comedy', 'Biography|Drama|Sport',
'Action|Mystery|Sci-Fi|Thriller',
'Animation|Family|Fantasy|Musical|Romance',
'Action|Adventure|Romance|Sci-Fi|Thriller', 'Comedy|Romance',
'Action|Drama|Romance', 'Biography|Crime|Drama|History|Romance',
'Biography|Crime|Drama', 'Action|Comedy|Thriller',
'Action|Comedy|Crime', 'Action|Drama|Mystery|Thriller',
'Drama|Western', 'Animation|Drama|Family|Musical|Romance',
'Action|Adventure|Comedy|Family|Mystery',
'Action|Romance|Thriller', 'Action|Fantasy|Horror|Mystery',
'Adventure|Drama|Thriller', 'Biography|Comedy|Crime|Drama',
'Action|Sci-Fi|War', 'Drama|Sci-Fi',
'Action|Adventure|Animation|Family|Fantasy',
'Action|Crime|Fantasy|Romance|Thriller', 'Adventure|Comedy|Sci-Fi',
'Action|Crime|Sport|Thriller',
'Action|Adventure|Biography|Drama|History|Thriller',
'Action|Comedy|Sci-Fi', 'Action|Drama|Thriller|War',
'Drama|Mystery|Thriller', 'Action|Adventure|Fantasy|Thriller',
'Crime|Drama', 'Drama|History|Romance|War',
'Animation|Comedy|Family|Sport', 'Comedy|Sci-Fi|Thriller',
'Drama|History|War', 'Comedy',
'Adventure|Animation|Comedy|Family|Romance',
'Drama|Family|Fantasy|Romance', 'Drama|Fantasy|Thriller',
'Drama|Mystery|Romance|Sci-Fi|Thriller',
'Drama|History|War|Western', 'Action|Adventure|Animation|Family',
'Adventure|Comedy|Family|Mystery|Sci-Fi',
'Drama|Fantasy|Horror|Mystery|Thriller',
'Animation|Comedy|Family|Sci-Fi',
'Adventure|Comedy|Drama|Fantasy|Romance',
'Action|Adventure|Comedy|Crime|Thriller', 'Crime|Drama|Thriller',
'Adventure|Animation|Family|Fantasy|Musical|War', 'Action|Comedy',
'Crime|Drama|Mystery|Thriller',
'Action|Adventure|Animation|Family|Fantasy|Sci-Fi',
'Adventure|Animation|Comedy|Family|Fantasy|Music',
'Drama|History|Thriller|War', 'Action|Animation|Comedy|Sci-Fi',
'Comedy|Family|Fantasy|Horror|Mystery',
'Drama|Mystery|Sci-Fi|Thriller', 'Action|Horror|Sci-Fi|Thriller',
'Crime|Mystery|Thriller',
'Action|Adventure|Comedy|Crime|Mystery|Thriller',
'Comedy|Drama|Sci-Fi', 'Action|Family|Fantasy|Musical',
'Drama|History|Sport', 'Adventure|Drama|Romance',
'Animation|Comedy|Family|Music|Romance',
'Animation|Comedy|Family|Fantasy|Musical|Romance',
'Adventure|Comedy|Family', 'Action|Crime|Drama|Mystery|Thriller',
'Action|Adventure|Comedy|Fantasy',
'Adventure|Comedy|Drama|Family|Fantasy',
'Action|Comedy|Fantasy|Romance', 'Comedy|Romance|Sci-Fi',
'Adventure|Comedy|Mystery', 'Comedy|Drama|Fantasy|Romance',
'Action|Comedy|Family|Fantasy',
'Action|Adventure|Fantasy|Horror|Sci-Fi',
'Crime|Drama|History|Mystery|Thriller', 'Comedy|Drama',
'Adventure|Animation|Comedy|Drama|Family|Fantasy|Sci-Fi',
'Action|Drama|Romance|Sci-Fi|Thriller', 'Comedy|Crime|Sport',
'Comedy|Family|Fantasy|Romance',
'Adventure|Drama|History|Romance|War', 'Comedy|Family|Sci-Fi',
'Fantasy|Horror|Mystery|Thriller',
'Adventure|Animation|Comedy|Family|Fantasy|Sci-Fi|Sport',
'Adventure|Comedy|Crime|Family|Mystery', 'Drama|Sci-Fi|Thriller',
'Action|Crime|Mystery|Romance|Thriller',
'Action|Adventure|Comedy|Romance',
'Adventure|Animation|Family|Western', 'Comedy|Family|Romance',
'Action|Adventure|Family|Sci-Fi|Thriller',
'Animation|Family|Fantasy|Music',
'Action|Adventure|Family|Fantasy|Thriller', 'Comedy|Fantasy',
'Action|Adventure|Comedy|Fantasy|Thriller',
'Action|Sci-Fi|Thriller', 'Drama|History|Thriller',
'Adventure|Animation|Family', 'Drama|Musical|Romance',
'Documentary|Drama', 'Action|Adventure|Drama|History|Romance',
'Adventure|Animation|Drama|Family|Musical',
'Animation|Comedy|Family|Fantasy|Sci-Fi',
'Adventure|Animation|Drama|Family|Fantasy', 'Sci-Fi|Thriller',
'Animation|Comedy|Family', 'Action|Crime|Fantasy|Thriller',
'Comedy|Drama|Family|Music|Musical|Romance',
'Horror|Mystery|Thriller', 'Action|Adventure|Comedy|Family|Sci-Fi',
'Comedy|Family', 'Biography|Comedy|Drama|History',
'Drama|Music|Musical', 'Comedy|Crime|Music',
'Action|Comedy|Romance|Thriller',
'Animation|Comedy|Family|Fantasy|Mystery',
'Comedy|Crime|Drama|Romance', 'Action|Adventure|Romance|Thriller',
'Drama|History|Romance', 'Action|Drama|Fantasy|Romance',
'Action|Adventure|Animation|Family|Sci-Fi', 'Action|Drama|Sci-Fi',
'Animation|Comedy|Fantasy', 'Action|Fantasy',
'Action|Animation|Comedy|Family',
'Action|Adventure|Comedy|Romance|Thriller', 'Action|Comedy|Sport',
'Biography|Drama|History|War', 'Adventure|Animation|Comedy',
'Action|Drama|Sport', 'Adventure|Drama|Family',
'Drama|Mystery|Romance|Thriller',
'Adventure|Animation|Comedy|Family|Fantasy|Romance',
'Adventure|Drama|War', 'Action|Adventure|Crime|Thriller',
'Fantasy|Mystery|Romance|Sci-Fi|Thriller',
'Drama|Fantasy|Mystery|Thriller',
'Animation|Comedy|Family|Fantasy|Music',
'Drama|Horror|Romance|Thriller', 'Drama|War', 'Drama',
'Action|Drama|Fantasy|Horror|War',
'Adventure|Family|Fantasy|Romance',
'Adventure|Biography|Drama|History|War',
'Action|Adventure|Horror|Sci-Fi', 'Action|Fantasy|Horror',
'Comedy|Drama|Musical|Romance', 'Action|Sci-Fi|Sport',
'Action|Adventure|Animation|Comedy|Crime|Family|Fantasy',
'Adventure|Animation|Family|Fantasy|Musical',
'Action|Crime|Mystery|Sci-Fi|Thriller',
'Action|Comedy|Crime|Drama|Thriller',
'Adventure|Drama|History|Romance', 'Biography|Drama|Thriller',
'Action|Adventure|Fantasy|War', 'Comedy|Fantasy|Romance',
'Drama|Horror|Sci-Fi|Thriller', 'Adventure|Drama|History',
'Action|Adventure|Comedy|Romance|Thriller|Western',
'Biography|Drama|Sport|War', 'Comedy|Drama|Family|Musical',
'Action|Adventure|Fantasy|Horror|Sci-Fi|Thriller', 'Drama|Sport',
'Action|Fantasy|Sci-Fi|Thriller', 'Drama|Mystery|Romance',
'Adventure|Biography|Drama|History|Sport|Thriller',
'Crime|Drama|Fantasy', 'Adventure|Biography|Crime|Drama|Western',
'Action|War', 'Comedy|Romance|Sport',
'Crime|Drama|Mystery|Thriller|Western', 'Comedy|Sport',
'Comedy|Drama|Family', 'Crime|Drama|Fantasy|Mystery',
'Adventure|Animation|Biography|Drama|Family|Fantasy|Musical',
'Drama|Romance|Western', 'Documentary|Music', 'Drama|Thriller',
'Animation|Family|Fantasy', 'Action|Fantasy|Horror|Sci-Fi',
'Biography|Comedy|Drama', 'Action|Horror|Sci-Fi',
'Adventure|Comedy', 'Biography|Drama|History|Sport',
'Comedy|Crime|Romance|Thriller', 'Comedy|Crime|Romance',
'Horror|Mystery|Sci-Fi|Thriller', 'Biography|Drama|Music',
'Drama|Fantasy|Sport', 'Adventure|Comedy|Drama|Music',
'Action|Fantasy|Horror|Sci-Fi|Thriller',
'Adventure|Animation|Comedy|Drama|Family|Fantasy|Romance',
'Horror|Sci-Fi|Thriller', 'Drama|Fantasy|Mystery|Romance|Thriller',
'Action|Adventure|Drama|History|Romance|War',
'Drama|Fantasy|Mystery|Romance', 'Fantasy|Horror|Mystery|Romance',
'Adventure|Comedy|Family|Romance|Sci-Fi', 'Drama|Horror|Thriller',
'Action|Comedy|Mystery|Romance',
'Action|Adventure|Comedy|Romance|Sci-Fi',
'Action|Biography|Drama|History|Thriller|War',
'Adventure|Comedy|Family|Fantasy|Horror',
'Comedy|Family|Romance|Sci-Fi', 'Action|Adventure|Thriller|War',
'Comedy|Drama|Romance|Sport', 'Action|Comedy|Crime|Drama',
'Drama|Music|Romance|War', 'Action|Comedy|Drama|Family|Thriller',
'Action|Crime',
'Adventure|Animation|Drama|Family|History|Musical|Romance',
'Action|Adventure|Drama|Romance|Sci-Fi',
'Action|Adventure|Comedy|Family|Romance',
'Action|Adventure|Comedy|Western',
'Biography|Drama|History|Musical',
'Adventure|Drama|Horror|Thriller', 'Action|Drama|Sport|Thriller',
'Drama|Musical|Romance|Thriller', 'Comedy|Drama|Family|Fantasy',
'Adventure|Comedy|Crime|Family|Musical',
'Drama|Music|Musical|Romance', 'Drama|Mystery|Romance|War',
'Action|Adventure|Romance|Sci-Fi',
'Adventure|Animation|Drama|Family|Fantasy|Musical|Mystery|Romance',
'Action|Horror|Thriller', 'Drama|History|Horror',
'Drama|Romance|Sport', 'Comedy|Family|Musical|Romance',
'Romance|Sci-Fi|Thriller', 'Biography|Comedy|Drama|Romance',
'Mystery|Sci-Fi|Thriller', 'Drama|Fantasy|Horror',
'Adventure|Comedy|Drama|Fantasy|Musical',
'Action|Adventure|Family|Fantasy|Sci-Fi|Thriller',
'Adventure|Comedy|Family|Fantasy|Romance|Sport',
'Adventure|Horror|Mystery', 'Crime|Drama|Romance|Thriller',
'Comedy|Crime|Drama|Thriller', 'Drama|Fantasy',
'Adventure|Comedy|Drama', 'Action|Biography|Drama|History|War',
'Adventure|Comedy|Fantasy', 'Adventure|Comedy|Crime|Drama|Family',
'Action|Biography|Crime|Drama|Thriller', 'Comedy|Sci-Fi',
'Action|Adventure|Comedy|Crime|Music|Mystery',
'Action|Crime|Drama|Sci-Fi|Thriller',
'Action|Adventure|Comedy|Drama|War', 'Drama|Mystery|Sci-Fi',
'Crime|Drama|Music', 'Adventure|Crime|Drama|Western',
'Comedy|Drama|Thriller',
'Action|Comedy|Crime|Music|Romance|Thriller',
'Crime|Romance|Thriller', 'Action|Adventure|Drama|Sci-Fi|Thriller',
'Action|Drama|Fantasy|Thriller|Western',
'Action|Drama|Mystery|Thriller|War', 'Action|Comedy|Crime|Romance',
'Action|Adventure|Family|Fantasy|Sci-Fi',
'Adventure|Comedy|Family|Musical', 'Action|Horror',
'Action|Adventure|Horror|Thriller', 'Comedy|Drama|Music|Romance',
'Action|Crime|Drama|Romance|Thriller',
'Comedy|Family|Romance|Sport', 'Drama|Family|Fantasy',
'Drama|Fantasy|Musical|Romance',
'Adventure|Comedy|Family|Fantasy|Sci-Fi', 'Comedy|Musical',
'Biography|Drama|History', 'Action|Crime|Drama|Thriller|War',
'Comedy|Crime|Thriller', 'Biography|Drama|History|Thriller',
'Action|Adventure|Crime|Drama|Mystery|Thriller',
'Animation|Family|Fantasy|Musical', 'Adventure|Drama|Western',
'Biography|Drama|History|Romance', 'Drama|Horror|Mystery|Thriller',
'Action|Fantasy|Western', 'Drama|Music',
'Action|Drama|Family|Sport', 'Action|Biography|Drama|Thriller|War',
'Comedy|Drama|Sport', 'Horror|Mystery',
'Adventure|Comedy|Sci-Fi|Western', 'Fantasy|Horror|Romance',
'Biography|Drama|Romance', 'Action|Adventure|Drama|Romance|War',
'Adventure|Comedy|Crime|Romance',
'Comedy|Drama|Family|Fantasy|Romance', 'Horror',
'Action|Adventure|Drama|Romance|Thriller',
'Biography|Drama|Music|Musical', 'Drama|History', 'Comedy|Western',
'Action|Adventure|Crime|Fantasy|Mystery|Thriller',
'Adventure|Drama|Mystery', 'Biography|Crime|Drama|Music',
'Crime|Drama|Horror|Thriller', 'Horror|Thriller',
'Adventure|Animation|Comedy|Drama|Family|Fantasy|Musical',
'Action|Adventure|Comedy|Music|Thriller',
'Adventure|Animation|Comedy|Crime|Family',
'Comedy|Romance|Sci-Fi|Thriller', 'Comedy|Crime|Family|Romance',
'Crime|Horror|Thriller', 'Action|Horror|Mystery|Sci-Fi|Thriller',
'Comedy|Fantasy|Sci-Fi',
'Adventure|Animation|Comedy|Fantasy|Romance',
'Action|Adventure|Family|Thriller',
'Adventure|Comedy|Drama|Romance|Thriller|War',
'Action|Drama|Fantasy', 'Action|Adventure|Drama|Fantasy|War',
'Drama|Fantasy|Romance|Sci-Fi',
'Animation|Comedy|Family|Horror|Sci-Fi',
'Biography|Drama|Romance|Sport', 'Action|Biography|Drama',
'Adventure|Drama', 'Horror|Mystery|Sci-Fi',
'Action|Adventure|Drama|Thriller|Western',
'Adventure|Family|Fantasy|Sci-Fi', 'Action|Biography|Drama|Sport',
'Drama|Family',
'Action|Adventure|Crime|Drama|Family|Fantasy|Romance|Thriller',
'Biography|Comedy|Romance', 'Action|Biography|Drama|History',
'Biography|Drama|War', 'Drama|Romance|War',
'Adventure|Comedy|Family|Sci-Fi',
'Biography|Drama|Family|History|Sport',
'Biography|Comedy|Drama|History|Music', 'Fantasy|Horror',
'Comedy|Drama|Romance|Sci-Fi',
'Adventure|Animation|Comedy|Family|War',
'Action|Comedy|Sci-Fi|Thriller', 'Comedy|Horror',
'Drama|Thriller|War', 'Comedy|Music', 'Action|Western',
'Action|Adventure|Family|Sci-Fi',
'Adventure|Biography|Drama|Thriller', 'Drama|Romance|War|Western',
'Action|Adventure|Comedy|Drama|Thriller', 'Drama|Music|Romance',
'Action|Adventure|Crime|Drama|Thriller',
'Crime|Horror|Mystery|Thriller', 'Adventure|Comedy|Family|Sport',
'Comedy|Drama|Fantasy', 'Comedy|Family|Sport',
'Action|Adventure|Drama|Family', 'Drama|Family|Sport',
'Action|Thriller|Western', 'Action|Drama|Fantasy|Horror|Thriller',
'Animation|Comedy|Family|Fantasy|Musical',
'Action|Crime|Drama|Mystery|Sci-Fi|Thriller',
'Adventure|Comedy|Crime|Drama', 'Drama|Mystery',
'Comedy|Fantasy|Horror|Thriller',
'Crime|Drama|Mystery|Sci-Fi|Thriller', 'Comedy|Crime|Musical',
'Comedy|Drama|Family|Music|Romance', 'Comedy|Horror|Romance',
'Comedy|Family|Fantasy|Sport',
'Animation|Comedy|Family|Mystery|Sci-Fi',
'Animation|Drama|Family|Fantasy|Musical|Romance',
'Comedy|Horror|Musical|Sci-Fi', 'Crime|Drama|Sport',
'Action|Adventure|Animation|Drama|Mystery|Sci-Fi|Thriller',
'Action|Adventure|Crime|Drama|Romance', 'Action|Comedy|Horror',
'Adventure|Horror|Thriller', 'Adventure|Fantasy|Mystery',
'Biography|Crime|Drama|History|Western',
'Action|Biography|Crime|Drama', 'Biography|Drama|Music|Romance',
'Biography|Crime|Drama|History|Music',
'Adventure|Animation|Comedy|Drama|Family|Musical',
'Comedy|Drama|Music', 'Drama|Romance|Thriller',
'Action|Fantasy|Horror|Thriller', 'Adventure|Biography',
'Action|Comedy|Family', 'Action|Horror|Romance',
'Action|Comedy|Crime|Music',
'Action|Drama|Fantasy|Mystery|Sci-Fi|Thriller',
'Action|Crime|Drama|History|Western', 'Comedy|Crime|Drama',
'Comedy|Family|Fantasy|Music|Romance',
'Adventure|Comedy|Crime|Music',
'Action|Adventure|Comedy|Sci-Fi|Thriller',
'Action|Crime|Drama|Western',
'Action|Adventure|Comedy|Family|Romance|Sci-Fi',
'Action|Fantasy|Romance|Sci-Fi', 'Comedy|Crime|Mystery|Romance',
'Adventure|Family', 'Comedy|Drama|Family|Romance',
'Action|Drama|Music|Romance',
'Adventure|Comedy|Family|Fantasy|Horror|Mystery',
'Action|Biography|Drama|History|Romance|Western',
'Biography|Drama|Family',
'Action|Adventure|Comedy|Crime|Family|Romance|Thriller',
'Drama|Romance|Sci-Fi', 'Comedy|Fantasy|Horror|Romance',
'Comedy|Family|Music', 'Action|Comedy|Music',
'Adventure|Comedy|Crime', 'Biography|Comedy|Drama|Sport',
'Fantasy|Horror|Thriller', 'Comedy|Drama|Romance|Thriller',
'Adventure|Comedy|Family|Romance',
'Adventure|Family|Fantasy|Musical',
'Biography|Crime|Drama|History|Thriller', 'Crime|Drama|History',
'Biography|Drama|Thriller|War',
'Drama|Music|Mystery|Romance|Thriller',
'Action|Adventure|Fantasy|Horror', 'Crime|Drama|Mystery|Romance',
'Action|Drama|Western', 'Comedy|War',
'Adventure|Comedy|Family|Fantasy|Music|Sci-Fi',
'Adventure|Family|Fantasy|Music|Musical',
'Action|Adventure|Animation|Comedy|Fantasy',
'Adventure|Comedy|Horror|Sci-Fi', 'Horror|Sci-Fi',
'Biography|Comedy|Drama|Family|Sport',
'Action|Crime|Drama|Thriller|Western',
'Drama|Fantasy|Romance|Thriller', 'Comedy|Mystery',
'Comedy|Drama|Musical|Romance|War',
'Drama|History|Music|Romance|War', 'Comedy|History',
'Animation|Comedy|Fantasy|Musical', 'Action|Comedy|Documentary',
'Adventure|Comedy|Drama|Family|Romance',
'Adventure|Comedy|Drama|Family|Mystery',
'Drama|Family|Music|Romance', 'Fantasy|Romance',
'Adventure|Animation|Family|Musical',
'Animation|Comedy|Drama|Family|Musical',
'Biography|Crime|Drama|History',
'Adventure|Comedy|Fantasy|Music|Sci-Fi',
'Action|Adventure|Drama|Mystery',
'Comedy|Crime|Family|Mystery|Romance|Thriller',
'Action|Adventure|Drama|Romance|Western',
'Adventure|Crime|Mystery|Sci-Fi|Thriller',
'Adventure|Biography|Drama',
'Adventure|Drama|Horror|Mystery|Thriller', 'Crime|Fantasy|Horror',
'Animation|Family|Fantasy|Mystery', 'Action|Comedy|Crime|Fantasy',
'Comedy|Family|Music|Musical',
'Drama|Mystery|Romance|Thriller|War', 'Action|Crime|Drama|Sport',
'Drama|Fantasy|Horror|Mystery', 'Comedy|Drama|Music|War',
'Comedy|Musical|Romance', 'Comedy|Crime|Drama|Mystery|Romance',
'Biography|Comedy|Drama|History|Music|Musical',
'Animation|Drama|Mystery|Sci-Fi|Thriller',
'Adventure|Comedy|Drama|Romance', 'Adventure|Animation|Fantasy',
'Comedy|Drama|Mystery|Romance|Thriller|War',
'Biography|Comedy|Musical', 'Crime|Drama|Western',
'Action|Adventure|Animation|Family|Sci-Fi|Thriller',
'Comedy|Family|Fantasy|Sci-Fi',
'Action|Comedy|Crime|Fantasy|Horror|Mystery|Sci-Fi|Thriller',
'Crime|Drama|Mystery', 'Adventure|Comedy|Romance',
'Family|Fantasy|Music', 'Crime|Drama|Music|Thriller',
'Action|Drama|Fantasy|Mystery|Thriller',
'Biography|Drama|History|Music', 'Biography|Drama|Family|Sport',
'Adventure|Fantasy|Mystery|Thriller',
'Biography|Drama|Romance|War',
'Action|Horror|Romance|Sci-Fi|Thriller',
'Action|Drama|History|Romance|War|Western',
'Action|Animation|Sci-Fi|Thriller',
'Action|Animation|Comedy|Crime|Family',
'Drama|Family|Music|Musical', 'Drama|Family|Musical|Romance',
'Comedy|Drama|Family|Fantasy|Sci-Fi', 'Comedy|Music|Romance',
'Adventure|Comedy|Family|Fantasy|Musical',
'Adventure|Crime|Drama|Romance', 'Biography|Crime|Drama|Thriller',
'Comedy|Mystery|Sci-Fi|Thriller', 'Drama|Fantasy|War',
'Action|Comedy|Crime|Family', 'Action|Comedy|Mystery',
'Comedy|Crime|Mystery', 'Action|Crime|Sci-Fi',
'Comedy|Horror|Sci-Fi', 'Drama|Family|Romance',
'Adventure|Comedy|Family|Music|Romance', 'Comedy|Horror|Thriller',
'Comedy|Family|Music|Romance',
'Adventure|Fantasy|Horror|Mystery|Thriller',
'Crime|Drama|Musical|Romance', 'Family|Music|Romance',
'Biography|Drama|History|Thriller|War',
'Adventure|Crime|Drama|Mystery|Western',
'Comedy|Crime|Drama|Thriller|War', 'Fantasy|Horror|Mystery',
'Action|Comedy|Drama|War', 'Comedy|Drama|Fantasy|Music|Romance',
'Adventure|Mystery|Thriller', 'Comedy|Drama|War',
'Comedy|Mystery|Romance', 'Biography|Crime|Drama|War',
'Biography|Comedy|Drama|War', 'Comedy|Crime|Family|Sci-Fi',
'Adventure|Family|Sci-Fi', 'Adventure|Comedy|Romance|Sci-Fi',
'Action|Adventure|Comedy|Family',
'Biography|Comedy|Crime|Drama|Romance', 'Crime|Drama|Musical',
'Comedy|Drama|Family|Sport', 'Animation|Comedy|Crime|Drama|Family',
'Action|Adventure|Comedy|Fantasy|Mystery',
'Action|Adventure|Drama|Thriller|War', 'Crime|Drama|Music|Romance',
'Adventure|Animation|Comedy|Crime',
'Adventure|Comedy|Fantasy|Sci-Fi',
'Comedy|Drama|Family|Fantasy|Musical', 'Comedy|Crime|Family',
'Adventure|Drama|Thriller|War', 'Comedy|Drama|Horror|Sci-Fi',
'Crime|Drama|Romance', 'Drama|Fantasy|Music|Romance',
'Family|Sci-Fi', 'Drama|History|Romance|Western',
'Action|Comedy|War', 'Adventure|Comedy|Music|Sci-Fi',
'Drama|Family|Musical', 'Action|Comedy|Drama|Music',
'Adventure|Comedy|Drama|Fantasy', 'Fantasy|Horror|Sci-Fi',
'Comedy|Romance|Thriller', 'Biography|Crime|Drama|Romance',
'Adventure|Comedy|Drama|Romance|Sci-Fi',
'Drama|Music|Mystery|Romance', 'Action|Crime|Drama',
'Adventure|Biography|Drama|War', 'Action|Comedy|Drama',
'Action|Drama|Romance|Thriller',
'Action|Biography|Drama|History|Romance|War',
'Horror|Musical|Sci-Fi', 'Biography|Drama|Family|Musical|Romance',
'Comedy|Crime|Drama|Romance|Thriller', 'Drama|Horror',
'Animation|Comedy|Drama|Romance', 'Comedy|Crime|Musical|Romance',
'Comedy|Crime|Musical|Mystery', 'Action|Animation|Sci-Fi',
'Drama|Romance|Sci-Fi|Thriller', 'Animation|Biography|Drama|War',
'Crime|Horror', 'Adventure|Biography|Drama|History',
'Action|Crime|Horror|Sci-Fi|Thriller', 'Western',
'Drama|Mystery|War', 'Comedy|Drama|Musical',
'Mystery|Romance|Thriller', 'Adventure|Comedy|Drama|Family',
'Musical|Romance', 'Documentary|Drama|War',
'Biography|Crime|Drama|Western', 'Comedy|Family|Fantasy|Musical',
'Crime|Drama|Musical|Romance|Thriller',
'Fantasy|Horror|Romance|Thriller', 'Comedy|Drama|Music|Musical',
'Action|Sport', 'Action|Comedy|Drama|Thriller',
'Drama|Horror|Mystery|Sci-Fi|Thriller', 'Comedy|Documentary',
'Adventure|Horror', 'Documentary',
'Biography|Crime|Drama|Romance|Thriller',
'Comedy|Crime|Drama|Mystery|Thriller',
'Biography|Crime|Drama|Mystery|Thriller',
'Crime|Horror|Music|Thriller', 'Crime|Thriller|War',
'Comedy|Drama|Romance|War', 'Drama|Musical', 'Fantasy|Thriller',
'Crime|Drama|Fantasy|Romance', 'Comedy|Horror|Mystery',
'Adventure|War|Western',
'Biography|Comedy|Musical|Romance|Western',
'Adventure|Comedy|Musical|Romance',
'Action|Adventure|Comedy|Musical', 'Comedy|Drama|Fantasy|Horror',
'Crime|Documentary|Drama', 'Biography|Comedy|Documentary',
'Comedy|Documentary|Music', 'Crime|Drama|History|Romance',
'Comedy|Drama|Horror', 'Drama|Family|Western',
'Comedy|Crime|Drama|Sci-Fi', 'Comedy|Family|Musical|Romance|Short',
'Comedy|Documentary|War', 'Action|Comedy|Horror|Sci-Fi',
'Animation|Comedy|Drama',
'Animation|Biography|Documentary|Drama|History|War',
'Documentary|War', 'Documentary|History',
'Biography|Documentary|History',
'Action|Adventure|Comedy|Drama|Music|Sci-Fi',
'Crime|Drama|Film-Noir|Mystery|Thriller',
'Comedy|Fantasy|Musical|Sci-Fi',
'Biography|Crime|Documentary|History|Thriller',
'Adventure|Comedy|Horror', 'Adventure|Comedy|Sport',
'Action|Drama|Horror|Thriller', 'Comedy|Horror|Musical',
'Biography|Crime|Documentary|History', 'Crime|Documentary|War',
'Documentary|Sport', 'Adventure|Biography|Documentary|Drama',
'Thriller', 'Comedy|Fantasy|Thriller', 'Drama|Fantasy|Sci-Fi',
'Action|Adventure|Drama|War',
'Action|Adventure|Animation|Comedy|Fantasy|Sci-Fi',
'Documentary|Drama|Sport', 'Documentary|History|Music',
'Adventure|Family|Romance',
'Adventure|Biography|Drama|Horror|Thriller',
'Biography|Documentary|Sport',
'Action|Biography|Documentary|Sport',
'Comedy|Fantasy|Horror|Musical', 'Biography|Documentary',
'Action|Fantasy|Horror|Mystery|Thriller',
'Animation|Comedy|Drama|Fantasy|Sci-Fi', 'Sci-Fi',
'Adventure|Horror|Sci-Fi', 'Crime|Documentary',
'Comedy|Crime|Drama|Horror|Thriller', 'Comedy|Documentary|Drama',
'Comedy|Crime|Horror'], dtype=object), 758)
The column genres has huge amount of values unique values. Let us divide this feature in to 2 different features with main_genre and the genres
dataset['main_genre'] = dataset.genres.str.split('|').str[0]
dataset.main_genre.unique(), dataset.main_genre.nunique()
(array(['Action', 'Adventure', 'Drama', 'Animation', 'Comedy', 'Mystery',
'Crime', 'Biography', 'Fantasy', 'Documentary', 'Sci-Fi', 'Horror',
'Romance', 'Family', 'Western', 'Musical', 'Thriller'],
dtype=object), 17)
Lets convert both the columns in to the numbericals. The main_genre and the genres
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
dataset['main_genre'] = le.fit_transform(dataset.main_genre)
genres_value_counts = dataset.genres.value_counts()
genres_value_counts = pd.DataFrame(genres_value_counts).reset_index().rename(columns = {'index' : 'genres', 'genres' : 'genres_value_counts'})
dataset = pd.merge(dataset, genres_value_counts,left_on = 'genres', right_on = 'genres', how = 'left')
dataset = dataset.drop(columns = 'genres')
dataset.actor_1_name.unique(), dataset.actor_1_name.nunique()
(array(['CCH Pounder', 'Johnny Depp', 'Christoph Waltz', ...,
'Carlos Gallardo', 'Kerry Bishé', 'John August'], dtype=object), 1485)
The variable actor_1_name is also having high cardinaity, hence we decide to change it in to the number of counts
actor_1_name_value_counts = dataset.actor_1_name.value_counts()
actor_1_name_value_counts = pd.DataFrame(actor_1_name_value_counts).reset_index().rename(columns = {'index' : 'actor_1_name', 'actor_1_name' : 'actor_1_name_value_counts'})
dataset = pd.merge(dataset, actor_1_name_value_counts,left_on = 'actor_1_name', right_on = 'actor_1_name', how = 'left')
dataset = dataset.drop(columns = 'actor_1_name')
dataset.movie_title.unique(), dataset.movie_title.nunique()
(array(['Avatar\xa0', "Pirates of the Caribbean: At World's End\xa0",
'Spectre\xa0', ..., 'El Mariachi\xa0', 'Newlyweds\xa0',
'My Date with Drew\xa0'], dtype=object), 3749)
As we see out of 3816 records, we have 3749 unique records which in not helpful for us for making predictions. So we drop the column from our dataframe
dataset = dataset.drop(columns = 'movie_title')
dataset.actor_3_name.unique(), dataset.actor_3_name.nunique()
(array(['Wes Studi', 'Jack Davenport', 'Stephanie Sigman', ...,
'Consuelo Gómez', 'Daniella Pineda', 'Jon Gunn'], dtype=object), 2661)
This variable also has high cadinality. So changing it in to the value counts variable.
actor_3_name_value_counts = dataset.actor_3_name.value_counts()
actor_3_name_value_counts = pd.DataFrame(actor_3_name_value_counts).reset_index().rename(columns = {'index' : 'actor_3_name', 'actor_3_name' : 'actor_3_name_value_counts'})
dataset= pd.merge(dataset, actor_3_name_value_counts,left_on = 'actor_3_name', right_on = 'actor_3_name', how = 'left')
dataset = dataset.drop(columns = 'actor_3_name')
dataset.plot_keywords.unique(), dataset.plot_keywords.nunique()
(array(['avatar|future|marine|native|paraplegic',
'goddess|marriage ceremony|marriage proposal|pirate|singapore',
'bomb|espionage|sequel|spy|terrorist', ...,
'assassin|death|guitar|gun|mariachi',
'written and directed by cast member',
'actress name in title|crush|date|four word title|video camera'],
dtype=object), 3750)
Looking in to the variable, we can see has a high cardinality which is unstable and we can delete such variable and mainly, we need to extract the main_plot_keywords of all in it.
dataset['main_plot_keyword'] = dataset.plot_keywords.str.split('|').str[0]
dataset = dataset.drop(columns = 'plot_keywords')
dataset.main_plot_keyword.unique(), dataset.main_plot_keyword.nunique()
(array(['avatar', 'goddess', 'bomb', ..., 'jihad',
'written and directed by cast member', 'actress name in title'],
dtype=object), 1688)
As we see the extracted main Plot keyword also consists of high cardinality but is stable. we can replace it with the value counts
main_plot_keyword_value_counts = dataset.main_plot_keyword.value_counts()
main_plot_keyword_value_counts = pd.DataFrame(main_plot_keyword_value_counts).reset_index().rename(columns = {'index' : 'main_plot_keyword', 'main_plot_keyword' : 'main_plot_keyword_value_counts'})
dataset = pd.merge(dataset, main_plot_keyword_value_counts, left_on = 'main_plot_keyword', right_on = 'main_plot_keyword', how = 'left')
dataset = dataset.drop(columns = 'main_plot_keyword')
dataset.movie_imdb_link.unique(), dataset.movie_imdb_link.nunique()
(array(['http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1',
'http://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_1',
'http://www.imdb.com/title/tt2379713/?ref_=fn_tt_tt_1', ...,
'http://www.imdb.com/title/tt0104815/?ref_=fn_tt_tt_1',
'http://www.imdb.com/title/tt1880418/?ref_=fn_tt_tt_1',
'http://www.imdb.com/title/tt0378407/?ref_=fn_tt_tt_1'],
dtype=object), 3750)
This variable movie_imdb_link is however unique the whole. So considering it will not help out prediciting variable we drop it off.
dataset = dataset.drop(columns = 'movie_imdb_link')
dataset.language.unique(), dataset.language.nunique()
(array(['English', 'Mandarin', 'Aboriginal', 'Spanish', 'French',
'Filipino', 'Maya', 'Kazakh', 'Telugu', 'Cantonese', 'Japanese',
'Aramaic', 'Italian', 'Dutch', 'Dari', 'German', 'Mongolian',
'Thai', 'Bosnian', 'Korean', 'Hungarian', 'Hindi', 'Icelandic',
'Danish', 'Portuguese', 'Norwegian', 'Czech', 'Russian', 'None',
'Zulu', 'Hebrew', 'Dzongkha', 'Arabic', 'Vietnamese', 'Indonesian',
'Romanian', 'Persian', 'Swedish'], dtype=object), 38)
Language variable has only 38 unique values and is consistent. So, we just do label encoding.
from sklearn.preprocessing import LabelEncoder
le1 = LabelEncoder()
dataset['language'] = le1.fit_transform(dataset.language)
dataset.country.unique(), dataset.country.nunique()
(array(['USA', 'UK', 'New Zealand', 'Canada', 'Australia', 'Germany',
'China', 'New Line', 'France', 'Japan', 'Spain', 'Hong Kong',
'Czech Republic', 'Peru', 'South Korea', 'India', 'Aruba',
'Denmark', 'Ireland', 'South Africa', 'Italy', 'Romania', 'Chile',
'Netherlands', 'Hungary', 'Russia', 'Belgium', 'Greece', 'Taiwan',
'Official site', 'Thailand', 'Iran', 'West Germany', 'Georgia',
'Mexico', 'Iceland', 'Brazil', 'Finland', 'Norway', 'Argentina',
'Colombia', 'Poland', 'Israel', 'Indonesia', 'Afghanistan',
'Sweden', 'Philippines'], dtype=object), 47)
Country variable has only 47 unique values and is consistent. So, we just do label encoding.
from sklearn.preprocessing import LabelEncoder
le2 = LabelEncoder()
dataset['country'] = le2.fit_transform(dataset.country)
dataset.content_rating.unique(),dataset.content_rating.nunique()
(array(['PG-13', 'PG', 'G', 'R', 'Approved', 'NC-17', 'Not Rated', 'X',
'Unrated', 'M', 'GP', 'Passed'], dtype=object), 12)
Content rating has only 12 unique variables and can be done label encoding
from sklearn.preprocessing import LabelEncoder
le3 = LabelEncoder()
dataset['content_rating'] = le3.fit_transform(dataset.content_rating)
dataset.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| color | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.000000e+00 |
| num_critic_for_reviews | 7.230000e+02 | 3.020000e+02 | 6.020000e+02 | 8.130000e+02 | 4.620000e+02 |
| duration | 1.780000e+02 | 1.690000e+02 | 1.480000e+02 | 1.640000e+02 | 1.320000e+02 |
| director_facebook_likes | 0.000000e+00 | 5.630000e+02 | 0.000000e+00 | 2.200000e+04 | 4.750000e+02 |
| actor_3_facebook_likes | 8.550000e+02 | 1.000000e+03 | 1.610000e+02 | 2.300000e+04 | 5.300000e+02 |
| actor_1_facebook_likes | 1.000000e+03 | 4.000000e+04 | 1.100000e+04 | 2.700000e+04 | 6.400000e+02 |
| gross | 7.605058e+08 | 3.094042e+08 | 2.000742e+08 | 4.481306e+08 | 7.305868e+07 |
| num_voted_users | 8.862040e+05 | 4.712200e+05 | 2.758680e+05 | 1.144337e+06 | 2.122040e+05 |
| cast_total_facebook_likes | 4.834000e+03 | 4.835000e+04 | 1.170000e+04 | 1.067590e+05 | 1.873000e+03 |
| facenumber_in_poster | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 0.000000e+00 | 1.000000e+00 |
| num_user_for_reviews | 3.054000e+03 | 1.238000e+03 | 9.940000e+02 | 2.701000e+03 | 7.380000e+02 |
| language | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 | 1.000000e+01 |
| country | 4.500000e+01 | 4.500000e+01 | 4.400000e+01 | 4.500000e+01 | 4.500000e+01 |
| content_rating | 7.000000e+00 | 7.000000e+00 | 7.000000e+00 | 7.000000e+00 | 7.000000e+00 |
| budget | 2.370000e+08 | 3.000000e+08 | 2.450000e+08 | 2.500000e+08 | 2.637000e+08 |
| title_year | 2.009000e+03 | 2.007000e+03 | 2.015000e+03 | 2.012000e+03 | 2.012000e+03 |
| actor_2_facebook_likes | 9.360000e+02 | 5.000000e+03 | 3.930000e+02 | 2.300000e+04 | 6.320000e+02 |
| imdb_score | 7.900000e+00 | 7.100000e+00 | 6.800000e+00 | 8.500000e+00 | 6.600000e+00 |
| aspect_ratio | 1.780000e+00 | 2.350000e+00 | 2.350000e+00 | 2.350000e+00 | 2.350000e+00 |
| movie_facebook_likes | 3.300000e+04 | 0.000000e+00 | 8.500000e+04 | 1.640000e+05 | 2.400000e+04 |
| director_name_value_counts | 7.000000e+00 | 7.000000e+00 | 8.000000e+00 | 8.000000e+00 | 3.000000e+00 |
| actor_2_name_value_counts | 3.000000e+00 | 7.000000e+00 | 2.000000e+00 | 5.000000e+00 | 3.000000e+00 |
| main_genre | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
| genres_value_counts | 1.200000e+01 | 2.500000e+01 | 4.500000e+01 | 2.200000e+01 | 4.600000e+01 |
| actor_1_name_value_counts | 4.000000e+00 | 3.800000e+01 | 4.000000e+00 | 9.000000e+00 | 2.000000e+00 |
| actor_3_name_value_counts | 3.000000e+00 | 4.000000e+00 | 1.000000e+00 | 2.000000e+00 | 1.000000e+00 |
| main_plot_keyword_value_counts | 2.000000e+00 | 1.000000e+00 | 7.000000e+00 | 2.000000e+00 | 6.900000e+01 |
dataset.profile_report()
As we look in to the profile report we are now having warnings of about the skewness and the zeros. This will be wiped off after doing a scaling operation after dealing with spiltting the dataset. All the unwanted variables will also be removed during the Feature elimination
datasetR = dataset.copy() #lets keep our original dataset for reference. Here datasetR is for Regression model
datasetC = dataset.copy() #Here datasetC is for classification model
from sklearn.model_selection import train_test_split
y = datasetR.pop('imdb_score')
X = datasetR
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8, test_size = 0.2, random_state = 42)
X_train.shape, y_train.shape, X_test.shape, y_test.shape
((3053, 26), (3053,), (764, 26), (764,))
We do scaling after we aplit the dataset as we donot want to make our training set metrics to fit the test set.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = pd.DataFrame(scaler.fit_transform(X_train.values), columns=X_train.columns, index=X_train.index)
X_test = pd.DataFrame(scaler.transform(X_test.values), columns = X_train.columns, index = X_test.index)
Building our model. As we are having many number of features, out of which there will be only some useful. Lets do some feature selection for our Regression model.
X_train.shape
(3053, 26)
We dont want our model to feed with all the variables which might mot help in prediction. We do remove variables having High Collinearity and use only variables useful for our model by doing the Recursive Feature Elimination.
#removing variables with high colinearity
def correlation(dataset, threshold):
col_corr = set() # Set of all the names of deleted columns
corr_matrix = dataset.corr()
for i in range(len(corr_matrix.columns)):
for j in range(i):
if (corr_matrix.iloc[i, j] >= threshold) and (corr_matrix.columns[j] not in col_corr):
colname = corr_matrix.columns[i] # getting the name of column
col_corr.add(colname)
if colname in dataset.columns:
del dataset[colname] # deleting the column from the dataset
correlation(X_train,0.90)
X_train.shape
(3053, 25)
#importing the required libraries
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
# Running RFE with the output number of the variable equal to 15
lm = LinearRegression()
lm.fit(X_train, y_train)
rfe = RFE(lm, 15) # running RFE
rfe = rfe.fit(X_train, y_train)
list(zip(X_train.columns,rfe.support_,rfe.ranking_))
[('color', True, 1),
('num_critic_for_reviews', True, 1),
('duration', True, 1),
('director_facebook_likes', False, 11),
('actor_3_facebook_likes', True, 1),
('actor_1_facebook_likes', True, 1),
('gross', True, 1),
('num_voted_users', True, 1),
('facenumber_in_poster', True, 1),
('num_user_for_reviews', True, 1),
('language', True, 1),
('country', False, 2),
('content_rating', False, 10),
('budget', True, 1),
('title_year', True, 1),
('actor_2_facebook_likes', True, 1),
('aspect_ratio', False, 3),
('movie_facebook_likes', True, 1),
('director_name_value_counts', False, 7),
('actor_2_name_value_counts', False, 9),
('main_genre', True, 1),
('genres_value_counts', False, 4),
('actor_1_name_value_counts', False, 5),
('actor_3_name_value_counts', False, 6),
('main_plot_keyword_value_counts', False, 8)]
col_rfe = X_train.columns[rfe.support_]
col_rfe
Index(['color', 'num_critic_for_reviews', 'duration', 'actor_3_facebook_likes',
'actor_1_facebook_likes', 'gross', 'num_voted_users',
'facenumber_in_poster', 'num_user_for_reviews', 'language', 'budget',
'title_year', 'actor_2_facebook_likes', 'movie_facebook_likes',
'main_genre'],
dtype='object')
X_train.columns[~rfe.support_]
Index(['director_facebook_likes', 'country', 'content_rating', 'aspect_ratio',
'director_name_value_counts', 'actor_2_name_value_counts',
'genres_value_counts', 'actor_1_name_value_counts',
'actor_3_name_value_counts', 'main_plot_keyword_value_counts'],
dtype='object')
#Creating a X_train dataframe with rfe varianles
X_train_rfe = X_train[col_rfe]
# Adding a constant variable for using the stats model
import statsmodels.api as sm
X_train_rfe_constant = sm.add_constant(X_train_rfe)
lm = sm.OLS(y_train,X_train_rfe_constant).fit() # Running the linear model
#Let's see the summary of our linear model
print(lm.summary())
OLS Regression Results
==============================================================================
Dep. Variable: imdb_score R-squared: 0.377
Model: OLS Adj. R-squared: 0.374
Method: Least Squares F-statistic: 122.4
Date: Wed, 04 Sep 2019 Prob (F-statistic): 7.83e-298
Time: 02:23:39 Log-Likelihood: -3793.9
No. Observations: 3053 AIC: 7620.
Df Residuals: 3037 BIC: 7716.
Df Model: 15
Covariance Type: nonrobust
==========================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------------
const 6.8264 0.180 37.892 0.000 6.473 7.180
color -0.3608 0.086 -4.220 0.000 -0.528 -0.193
num_critic_for_reviews 2.0974 0.178 11.753 0.000 1.747 2.447
duration 2.9343 0.220 13.312 0.000 2.502 3.367
actor_3_facebook_likes -1.0232 0.245 -4.172 0.000 -1.504 -0.542
actor_1_facebook_likes 0.9896 0.688 1.438 0.150 -0.359 2.339
gross -1.0088 0.200 -5.049 0.000 -1.401 -0.617
num_voted_users 6.4188 0.323 19.902 0.000 5.786 7.051
facenumber_in_poster -1.1126 0.313 -3.556 0.000 -1.726 -0.499
num_user_for_reviews -3.4556 0.332 -10.414 0.000 -4.106 -2.805
language 1.5674 0.203 7.703 0.000 1.168 1.966
budget -1.1040 0.611 -1.806 0.071 -2.302 0.094
title_year -2.0488 0.177 -11.547 0.000 -2.397 -1.701
actor_2_facebook_likes 0.3081 0.152 2.026 0.043 0.010 0.606
movie_facebook_likes -0.3907 0.228 -1.713 0.087 -0.838 0.056
main_genre 0.4281 0.085 5.052 0.000 0.262 0.594
==============================================================================
Omnibus: 471.393 Durbin-Watson: 1.926
Prob(Omnibus): 0.000 Jarque-Bera (JB): 994.415
Skew: -0.916 Prob(JB): 1.16e-216
Kurtosis: 5.113 Cond. No. 78.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
X_test_rfe = X_test[col_rfe]
X_test_rfe_constant = sm.add_constant(X_test_rfe)
y_pred_linear = lm.predict(X_test_rfe_constant)
y_pred_linear.values
array([ 7.01444312, 6.00876198, 6.62347135, 6.59482027, 6.18268366,
6.79254053, 6.71765315, 6.25384279, 6.08470355, 7.20717633,
5.71014709, 6.06612977, 5.9959564 , 6.64196818, 6.08914272,
7.42534011, 6.98659027, 6.18680856, 6.7656102 , 5.84074487,
5.88570792, 6.27443048, 6.38443962, 6.45300389, 5.94810376,
6.55519627, 6.83348579, 6.78324774, 7.49528209, 6.7076033 ,
6.41911769, 6.91743602, 6.19909485, 8.5792837 , 5.99658715,
6.1153123 , 6.07696256, 8.68334707, 6.58889215, 6.4787575 ,
6.39522147, 7.5316076 , 5.82752705, 6.28471002, 6.73123554,
5.72430876, 6.07725613, 6.02306239, 5.93702133, 5.46658224,
6.4272058 , 6.07471918, 5.74102048, 6.38838576, 7.40419389,
6.40289355, 5.98113985, 7.70453531, 8.19492268, 6.7685695 ,
7.04712224, 6.80422735, 9.94078061, 7.01846469, 8.20775291,
6.33402399, 6.35680065, 5.54112474, 6.49460001, 6.95320643,
6.1077915 , 6.77861348, 5.89979206, 6.11757466, 6.56769757,
6.07898786, 6.09894807, 6.34864654, 6.28462213, 6.06344886,
6.67896183, 6.26708125, 5.6900226 , 6.09804496, 7.17687135,
6.80343483, 9.2936546 , 6.25060878, 5.97708893, 5.98551069,
6.53302944, 6.86141968, 6.4195812 , 6.83240642, 6.71601208,
7.3574184 , 6.26536451, 7.26739823, 6.33981875, 6.69735576,
5.85849456, 6.65665419, 6.36601689, 6.22925778, 6.26067074,
6.06169243, 7.19567132, 6.35786519, 7.6193287 , 6.43435955,
6.79051612, 6.86136887, 6.32813145, 5.74222925, 6.68259266,
6.61671119, 5.74009576, 7.00903988, 7.74325908, 6.11405286,
7.60989869, 6.19226116, 5.81163381, 6.52134254, 6.20032616,
6.59550285, 7.52259346, 5.99911076, 5.98384559, 5.84605346,
6.54537472, 5.86255861, 6.06715117, 6.38005338, 6.47530756,
6.11636882, 5.71504247, 6.53060277, 7.23602877, 7.9957383 ,
6.08008597, 6.57033402, 7.6836214 , 6.20019198, 6.03092416,
6.24876309, 6.23726326, 6.57265252, 6.17005615, 6.20210799,
6.77262489, 6.19147131, 6.76991856, 5.92795216, 6.06377103,
6.61585331, 7.06441291, 6.19610145, 6.46782712, 6.37241704,
5.63586991, 7.25474476, 6.96829038, 6.78625748, 6.46503481,
6.23551168, 5.92419713, 6.24987751, 7.67712297, 6.24988283,
5.84008071, 7.94358893, 6.29906698, 7.1589256 , 6.29195319,
6.75674225, 6.40587035, 6.48889574, 7.14505097, 6.86725381,
6.18502399, 5.51875208, 5.96539252, 6.04442617, 6.29465585,
6.04802445, 5.99975866, 8.00829088, 7.12744368, 6.2979498 ,
6.5537354 , 6.80870917, 5.91064586, 6.43525308, 6.77667533,
7.57716341, 6.21157027, 6.6172308 , 6.42923658, 6.05737647,
6.81780002, 6.67406536, 6.43954199, 6.09379252, 7.56872381,
6.64216546, 6.2982413 , 6.22231666, 5.9404799 , 6.72407447,
5.88988139, 6.66384966, 6.20501767, 6.10281328, 5.4427566 ,
6.26981833, 6.13922053, 6.4088028 , 7.26330704, 6.48447318,
6.33907907, 6.0144283 , 6.26902963, 5.93925155, 6.2237208 ,
6.19056256, 6.22034983, 7.11237893, 6.30074649, 7.92732917,
6.41444925, 6.62002941, 6.07222206, 6.01523054, 6.54264732,
6.37123441, 6.49492952, 6.13655517, 7.36435425, 7.04562505,
8.97696393, 6.51025593, 6.04980787, 6.40145846, 6.37170652,
6.15351051, 5.78637192, 6.54410199, 6.05597997, 5.99549414,
5.59407126, 5.99662106, 6.74887423, 5.54502022, 6.09649921,
6.55840164, 6.49347791, 7.27922689, 5.77079801, 6.58741183,
6.22805196, 6.30303214, 6.10993779, 5.94756547, 5.54521954,
6.42095279, 6.02265969, 6.27956521, 5.92152949, 6.73473049,
5.68540873, 6.50954598, 6.08873957, 6.21163015, 6.15199405,
6.71236178, 5.78244426, 6.06810501, 6.30729665, 6.49903286,
7.06540656, 5.94969899, 7.31455884, 5.87292106, 8.23923134,
5.87083642, 5.9582297 , 6.18313475, 6.14879359, 5.98514567,
6.27524655, 6.32182334, 6.5007595 , 6.17042473, 6.79076453,
5.9049523 , 6.65665343, 6.36022798, 6.46280353, 6.01718958,
6.28119645, 6.50855401, 6.27367248, 5.66700574, 6.33183109,
6.71268603, 7.283805 , 6.22250404, 6.07339987, 6.04154328,
6.45677419, 5.72348801, 4.22900969, 8.05556047, 5.96250053,
6.53225024, 6.00109631, 7.3317144 , 7.51401592, 5.8766708 ,
6.20029658, 6.01078157, 6.80142277, 5.67930959, 7.92583261,
6.00090556, 5.77468913, 7.46833602, 6.73391619, 9.2914372 ,
6.20960017, 6.70260262, 6.24221717, 5.85058944, 6.73499364,
5.81114328, 7.04249081, 6.32907046, 6.274517 , 5.9889109 ,
5.81824617, 6.58427659, 6.49183953, 5.73546766, 7.19559454,
6.15443868, 6.74877786, 6.07017485, 5.79045305, 6.6413898 ,
8.189022 , 6.00257543, 6.21917704, 5.95772676, 6.61659413,
7.51270485, 6.86250378, 6.17017735, 5.90496364, 6.29030756,
6.6783423 , 6.48925282, 6.19443718, 6.33952312, 7.26164821,
6.04380702, 6.70863009, 5.90751674, 7.08487218, 6.71024076,
6.05406687, 5.88335329, 6.94040368, 6.10684993, 7.1927546 ,
6.8115464 , 5.79862251, 5.79539035, 6.39321974, 6.00895891,
6.44311272, 6.45231899, 6.16807742, 6.09958337, 6.44488228,
7.09042015, 6.26943513, 6.49391649, 6.34107833, 6.39115242,
5.90563051, 6.09634274, 8.08887053, 6.10393951, 5.78627411,
5.92374689, 5.95016789, 5.93949908, 6.4731637 , 8.44826466,
5.69480182, 6.70363477, 6.51495503, 5.9131167 , 6.02365892,
5.84912726, 6.58788384, 5.72245227, 6.51354025, 5.68987254,
6.83523894, 6.88002629, 5.86464737, 6.69985233, 6.42712472,
5.97948586, 5.40940077, 6.31156297, 6.70203347, 6.10923648,
5.94567711, 6.07116563, 6.42268337, 5.87284826, 6.8955506 ,
6.04955246, 7.03701716, 6.7690469 , 6.44846456, 5.99409098,
5.80772334, 5.77383824, 6.03193041, 6.74495677, 5.90484667,
6.18197985, 6.50649506, 7.17897095, 5.94794234, 6.4853657 ,
6.44635244, 8.19680847, 6.19357436, 5.88707597, 5.64585352,
5.73957016, 6.22880603, 6.10619147, 6.09469429, 6.16984236,
5.81690114, 7.09740376, 6.42341305, 6.59595603, 5.69960076,
7.11748518, 6.4197954 , 5.94468312, 7.05993391, 5.64469199,
7.93414685, 6.74227466, 6.66238308, 6.33628148, 8.22561696,
6.55469714, 6.11493135, 6.45024932, 6.8683111 , 6.25164502,
5.96494442, 5.91709433, 6.15006431, 6.44365713, 6.69988272,
7.03681332, 6.19511232, 5.95061831, 6.58506859, 7.15126841,
5.64406475, 6.80011852, 6.01136667, 6.7403674 , 5.91120206,
6.05884373, 7.45721939, 6.93198441, 7.0756329 , 5.69769759,
5.99499986, 9.55237278, 7.9656805 , 5.80282446, 6.30162648,
6.43039813, 6.43370286, 7.39078449, 6.45644908, 6.333051 ,
6.52462051, 6.01074205, 6.24477531, 6.27505464, 7.13651959,
5.96909354, 6.65019128, 6.51673145, 6.67802657, 6.58498844,
5.92427786, 6.90438109, 6.49576497, 6.11994619, 6.48438139,
6.70550927, 6.43469491, 6.9102127 , 6.44028909, 5.65193172,
6.18621801, 6.34749497, 7.69975682, 6.22261447, 5.57816294,
6.52217687, 7.94948028, 5.90849229, 5.78995822, 6.74749624,
6.54538231, 6.11233055, 6.15366431, 5.79742699, 6.4363126 ,
5.80522 , 5.83391965, 6.43933752, 5.90171803, 6.97508594,
6.47511162, 6.68378361, 6.54166793, 5.85523074, 6.94170412,
6.33052491, 6.35867934, 6.31764243, 5.84858533, 6.14953201,
6.45431843, 6.04380702, 7.25187546, 6.30412615, 5.90575258,
6.45172087, 6.70991796, 6.00366095, 5.93902873, 7.10822888,
7.04484267, 6.1062322 , 6.42566104, 6.20377184, 5.97716923,
9.09688977, 6.44708562, 6.36378089, 6.30788124, 6.34851919,
5.91320037, 6.05094146, 6.08843189, 6.33761116, 6.68647361,
7.28558696, 5.88369267, 5.81831716, 6.47864656, 5.96179678,
6.04710435, 6.00632327, 6.15608871, 5.94370644, 5.95142098,
6.26129077, 6.16084717, 6.54911051, 6.65401811, 6.30455519,
6.33869199, 5.9761729 , 6.8219931 , 6.49647624, 6.22451929,
5.85095411, 6.41152492, 6.82971444, 6.09316424, 6.3513206 ,
7.3537442 , 8.02587966, 6.58400564, 5.83724702, 5.71308104,
6.41542072, 7.88027143, 6.76854681, 6.41583918, 7.72456905,
6.02692742, 6.42987289, 6.88589573, 6.03705629, 6.01266741,
5.81847581, 5.88521781, 6.47843902, 6.40094048, 6.9386258 ,
6.50918185, 7.37670489, 6.04022072, 6.7027812 , 7.97129474,
5.667032 , 6.30134597, 6.27425087, 6.44121654, 5.78910508,
6.37522398, 6.13802408, 6.01403536, 6.07923569, 10.42232896,
5.81701064, 6.6929865 , 5.66144859, 7.64836821, 6.06970885,
7.66482855, 7.15021818, 5.73907937, 6.91044773, 7.00816465,
7.42268247, 5.93625303, 6.38337433, 6.46869506, 6.58372315,
6.20568159, 6.26154761, 6.65780666, 6.57827224, 6.38444268,
5.69379041, 5.60641813, 6.81442859, 5.93377003, 7.58862828,
6.39220674, 6.02521452, 5.83845626, 5.95926353, 10.66469837,
6.02908676, 5.70371781, 5.95369812, 7.67037693, 6.59646858,
9.86699488, 5.91806275, 6.07221182, 6.08406952, 6.1050526 ,
5.74238741, 5.93811455, 5.9171457 , 7.32447183, 6.15333266,
7.10672317, 5.93598167, 5.99253314, 5.86523774, 6.64058503,
6.3136585 , 6.08549748, 6.13823511, 5.70000006, 5.73122672,
5.8690804 , 6.89286465, 5.89062604, 6.04140067, 6.167561 ,
6.59797452, 6.18420537, 6.22729618, 6.36669937, 6.15675684,
6.87611213, 6.05312108, 6.72227043, 6.37326009, 6.02299821,
6.11843381, 5.99468849, 6.27757411, 6.15790642, 6.75300204,
6.85807015, 6.11895302, 8.25581907, 5.6595498 , 6.14232042,
6.5770143 , 6.33368037, 6.76689137, 7.00661649, 5.98409742,
6.11064449, 5.86027521, 6.04245422, 5.87572863, 6.10881253,
6.28651308, 7.58377321, 6.08103034, 6.82925452, 6.86310695,
6.83888807, 6.12001288, 6.65993953, 7.6752316 , 6.43508087,
7.47535674, 7.03011855, 6.57779147, 6.37566343, 6.73568608,
6.59515775, 7.66768608, 6.24842859, 6.46554374, 7.23298334,
6.14017329, 6.31792255, 6.01220764, 6.76173518, 5.64701079,
6.56848577, 5.97890944, 6.06093829, 6.10379742, 5.80686702,
6.28243399, 7.04786523, 5.73648904, 6.05909793, 7.04155029,
6.14810088, 6.092187 , 6.41011302, 6.12397594])
y_pred_linear.min(), y_pred_linear.max()
(4.229009694035662, 10.664698367074877)
from sklearn.metrics import mean_squared_error
mean_squared_error(y_pred_linear, y_test)
0.7064048678599874
After looking in to the stats, We observe that the r2 score is low of about 0.37 aafter having all consistent variables and the regression line is not fitting the data correctly. So we have to go for much advanced curved model such as support vector machine and ensemble algorithms to make our model to fit the data correctly.
from sklearn.svm import SVR
svr_rbf = SVR(kernel='rbf', gamma=0.1)
svr_lin = SVR(kernel='linear', gamma='auto')
svr_poly = SVR(kernel='poly', gamma='auto', degree=3)
svr_rbf.fit(X_train_rfe, y_train)
y_pred_svm_rbf = svr_rbf.predict(X_test_rfe)
y_pred_svm_rbf
array([7.13693576, 6.08113513, 6.77808165, 6.59318336, 6.26117227,
6.76335536, 6.94767209, 6.3861124 , 6.1395267 , 7.06277244,
5.96135133, 6.21115658, 6.42724741, 6.75124309, 6.21902249,
7.76001253, 7.11317133, 6.26515929, 6.80258488, 6.05281246,
6.10723795, 6.25076868, 6.65536616, 6.5606972 , 6.09113033,
6.69135271, 6.80187125, 6.86217694, 7.49892584, 6.89491778,
6.74152486, 7.05190519, 6.45804696, 8.30881778, 6.11161271,
6.24325829, 6.21618488, 8.05413198, 6.47557152, 6.56259474,
6.45950159, 7.33496651, 6.00391353, 7.13339919, 6.85608267,
5.9068141 , 6.24892883, 6.08417666, 6.13227679, 5.73604416,
6.56252083, 6.33413872, 5.83216684, 6.63872997, 7.30683801,
6.52196493, 6.11905187, 7.54156147, 8.441141 , 6.80261269,
7.07092282, 6.78956984, 9.09365559, 7.00108708, 8.21972302,
6.40869278, 7.23921413, 5.81769794, 6.59008364, 7.14253524,
6.30329565, 7.20499307, 6.16348681, 6.27976562, 6.57121336,
6.28424963, 6.32279469, 6.44937533, 6.47258632, 6.22265266,
6.82906996, 6.29666384, 5.82510803, 6.28262025, 7.01506714,
7.19284676, 8.93486362, 6.30633494, 6.14639514, 6.25963859,
6.54307126, 6.84924592, 6.41804702, 6.76130583, 6.82113819,
7.44165915, 6.35810801, 7.26880745, 6.4869866 , 6.88681264,
5.87366716, 6.65960136, 6.47891934, 6.31504331, 6.3066754 ,
6.13506366, 7.09993901, 6.53675109, 7.58248009, 6.42160614,
6.78990162, 6.93072243, 6.60932993, 5.9103457 , 6.83062692,
6.57711549, 5.91283778, 7.07138714, 7.74251159, 6.22161514,
7.65437436, 6.29988163, 5.9602958 , 6.66683713, 6.29969895,
6.73830337, 7.92851465, 6.27801031, 6.19448194, 6.31086516,
6.58691878, 6.09726171, 6.24987731, 7.07046708, 6.56944831,
6.34721377, 5.94517616, 6.74094491, 7.1094273 , 8.04569318,
6.33946892, 6.66152754, 7.41686082, 6.25864936, 6.1539585 ,
6.53168879, 6.44693389, 6.67279085, 6.63734875, 6.31334322,
6.72250858, 6.3118216 , 6.6830422 , 6.17721837, 6.11194276,
6.85783505, 6.96638058, 6.44235163, 6.44888472, 6.51959539,
5.81506029, 7.20161976, 7.19025965, 6.74409098, 6.71256295,
6.44660866, 6.07972891, 6.35158696, 7.70104196, 6.24606284,
6.13656875, 7.70207737, 6.444551 , 7.18939901, 6.47803328,
7.02984768, 7.00153754, 6.57672629, 7.32990101, 6.99490573,
6.50546905, 5.69068346, 6.15169517, 6.21041299, 6.4303461 ,
6.2774236 , 6.22627123, 8.00424464, 7.23921248, 6.65713132,
6.87498733, 6.92989509, 6.04660391, 6.65899749, 6.89243528,
7.33369668, 6.38372231, 6.69954053, 6.53939361, 6.32786392,
7.18316446, 6.78162997, 6.69172698, 6.22541458, 7.45626133,
6.70770829, 6.3687587 , 6.33568729, 6.15583662, 6.77014524,
6.0888132 , 6.72390634, 6.40429153, 6.26936909, 5.73557915,
6.37992829, 6.24755853, 6.52973811, 7.3732887 , 6.48288807,
6.69545961, 6.20842591, 6.39586229, 6.04644887, 6.40701926,
6.48969868, 6.46231906, 6.99697267, 6.38299915, 7.54983764,
6.59781057, 6.7736318 , 6.26184647, 6.29699378, 6.59135012,
6.93877846, 6.7421288 , 6.32072529, 7.21433679, 7.04269318,
8.73841154, 6.63661344, 6.33011692, 6.54979263, 6.42308222,
6.80056075, 5.94741234, 6.5595302 , 6.13728185, 6.53792683,
5.83896457, 6.18865037, 6.86025015, 5.78940457, 6.11119475,
6.77348793, 6.76914296, 7.19029276, 5.99286435, 6.5243984 ,
6.22915514, 6.38321786, 6.28607624, 6.0511558 , 5.87160794,
6.60270667, 6.3090483 , 6.47257676, 6.11402379, 6.88730917,
5.84811727, 6.61989531, 6.22438976, 6.35244885, 6.14968261,
6.72080457, 5.9581535 , 6.17516959, 6.34108343, 7.01604885,
7.32161491, 6.05830679, 7.08692253, 6.12926892, 7.76367733,
6.02143756, 6.2338998 , 6.32635082, 6.53076377, 6.10534672,
6.45304647, 6.34699421, 6.55661815, 6.21071558, 6.82628031,
6.10510795, 6.63296653, 6.51674328, 6.56094942, 6.19887382,
6.46568365, 6.65825527, 6.38654207, 5.85250478, 6.49003326,
6.80622085, 7.09630891, 6.266852 , 6.2383163 , 6.3579086 ,
6.49856367, 5.88552745, 5.63807313, 7.88425141, 6.10119671,
6.62073271, 6.2910287 , 7.35293235, 7.59559265, 6.0493053 ,
6.20170285, 6.14134301, 6.76506013, 6.03607549, 7.99478666,
6.21401653, 6.04010422, 7.44632838, 7.39883684, 8.81818344,
6.89122679, 7.12683039, 6.40787067, 6.17432834, 6.89967992,
6.07914334, 7.16927148, 6.30476645, 6.38270212, 6.21852176,
5.97282009, 7.03873329, 6.51375359, 5.947716 , 7.3377513 ,
6.37404935, 6.86418214, 6.28827307, 5.99796137, 6.53113215,
8.15509899, 6.21381889, 6.3661853 , 6.27167109, 6.71078349,
7.48977334, 7.07765887, 6.48419818, 6.15975559, 6.78786846,
6.70295906, 6.60474074, 6.31728326, 6.51585202, 7.16310953,
6.21003583, 6.84794891, 6.13843786, 7.09849623, 6.54991828,
6.27344733, 6.02373657, 6.99097801, 6.37162706, 7.2358919 ,
7.19972422, 6.00771726, 6.01097171, 6.55324834, 6.19031081,
6.6148951 , 6.50307659, 6.30045979, 6.2815652 , 6.46764972,
6.95777078, 6.40710975, 6.46260358, 6.35229705, 6.35383017,
6.12113152, 6.33707838, 8.07437347, 6.16951436, 5.90570986,
6.17130166, 6.1594313 , 6.13227689, 6.62629479, 8.37531294,
5.84692747, 6.76949013, 6.84976534, 6.09128399, 6.65683298,
6.27841043, 6.66227026, 5.91714669, 6.73802314, 5.93505142,
6.67687246, 7.04737197, 6.07754193, 6.76242788, 6.62122185,
6.15724067, 7.03645098, 6.43171164, 6.84909543, 6.26769522,
6.44821782, 6.22762498, 6.44228698, 6.02498329, 7.00608826,
6.21073457, 7.05826469, 7.22163182, 6.48714367, 6.04875897,
6.00067253, 5.95633735, 6.4357929 , 6.88108545, 6.06827971,
6.38211379, 6.52940006, 7.08690904, 6.2761522 , 6.474862 ,
6.53917492, 7.76307669, 6.29447052, 6.03162788, 6.0509349 ,
5.90204052, 6.30407008, 6.28093524, 6.30035587, 6.4351059 ,
5.93437683, 7.14483655, 6.48416786, 6.74803578, 5.87358811,
7.1962986 , 6.55482701, 6.10725566, 7.42303541, 5.843972 ,
7.81690407, 6.92105002, 6.67360878, 6.35043914, 7.97382938,
6.52348714, 6.25608072, 6.48499023, 6.86678643, 6.42179223,
6.12632971, 6.11701341, 6.30514228, 6.61252537, 6.76244616,
7.01136751, 6.46618483, 6.10841812, 6.8473938 , 7.12305082,
6.0697041 , 6.84381419, 6.20850712, 6.89313987, 6.32605975,
6.25206323, 7.40398657, 7.1469514 , 7.04618335, 5.92007904,
6.1525773 , 8.85657427, 5.81456025, 5.99682833, 6.45207444,
6.58517873, 6.87731171, 7.19488783, 6.64189906, 6.4943558 ,
6.75732033, 6.30621775, 6.42760309, 6.3304681 , 7.18366702,
6.21163 , 6.84294995, 6.64644973, 6.73245875, 6.72867153,
6.1367442 , 6.9234181 , 6.65772911, 6.30364297, 6.62151946,
6.73349419, 6.6723672 , 7.16720283, 6.50890441, 6.80388987,
6.44070064, 6.4731591 , 7.49474901, 6.40782583, 5.74979037,
6.43755776, 7.96575173, 5.94623016, 5.9977321 , 6.75210728,
6.58692342, 6.31398371, 6.54628352, 6.03772887, 6.57068514,
6.0858244 , 5.96401394, 6.4897282 , 6.06959289, 6.88464637,
6.60239462, 6.90760622, 6.70452864, 6.06329516, 7.0696709 ,
6.52252616, 6.47167238, 6.3157022 , 6.01479864, 6.21985965,
6.58835935, 6.21003583, 7.09068946, 6.66953881, 6.18255458,
6.65040555, 6.73779174, 6.28249674, 6.16711735, 7.04742572,
6.98649641, 6.18131287, 6.60386769, 6.43009634, 6.07623163,
8.71655661, 6.47723357, 6.36017083, 6.63697496, 6.49100744,
6.14231433, 6.27292661, 6.19911893, 6.39848332, 6.83138101,
7.34733876, 6.06775096, 6.06808723, 6.65394358, 6.02758887,
6.24410299, 5.83027597, 6.24479973, 6.04086759, 6.15744914,
6.44860299, 6.42427071, 6.63993933, 6.74174758, 6.47365444,
6.72944898, 6.10831023, 6.90590538, 6.68038255, 6.37325209,
6.11603326, 6.43114998, 6.70939139, 6.26647393, 6.52917317,
7.19013702, 7.78258158, 6.80526108, 6.08053568, 5.90082441,
6.54934634, 7.84262072, 6.76494865, 6.6122349 , 7.46980144,
6.26745859, 6.59557219, 6.94761166, 6.27203024, 6.25092203,
6.04276981, 6.08606122, 6.85368205, 6.45053267, 6.84697145,
6.60637083, 7.67063478, 6.38992584, 6.87872251, 7.63675191,
5.81750928, 6.39729871, 6.46771057, 6.54159298, 6.04245775,
6.56955316, 6.28943596, 6.20938885, 6.31875453, 9.32302064,
5.93314922, 6.64586911, 5.80806114, 7.7303426 , 6.33264913,
7.49308037, 7.11615044, 6.12696364, 6.8629544 , 7.09631916,
7.14568356, 6.13600067, 6.53437839, 6.47621294, 6.67843074,
6.23000343, 6.54123021, 6.72942941, 6.92062076, 6.55187211,
5.91675872, 5.89189732, 6.81687795, 6.23151121, 7.5878262 ,
6.3949337 , 6.2238142 , 5.93283066, 6.11283641, 9.26818655,
6.62374443, 5.90713495, 6.1917037 , 7.427921 , 6.71271747,
9.17897032, 6.18536493, 6.27385361, 6.29067996, 6.23494782,
5.95103834, 6.14935691, 6.04755667, 7.28729415, 6.25358658,
6.94771934, 6.05268822, 6.20954177, 6.81535812, 6.728886 ,
6.36101573, 6.18759008, 6.26659108, 5.94387984, 5.93218617,
6.08133629, 6.82690013, 6.10996787, 6.20792686, 6.44905255,
6.77089605, 6.25284327, 6.24066759, 6.46249672, 6.62555344,
6.90314991, 6.25616038, 6.93915404, 6.4678139 , 6.16304355,
6.3627619 , 6.15288284, 6.43607866, 6.29959222, 6.80938656,
6.96697128, 6.26132142, 7.95726525, 5.88927525, 6.32346027,
6.57032008, 6.41479589, 6.92518897, 7.05170393, 6.06853077,
6.43094609, 6.08897994, 6.25398295, 6.09995325, 6.21325705,
6.31322795, 7.60703485, 6.25065889, 6.88215178, 6.81472112,
6.81899309, 6.23692551, 6.82937088, 7.46534849, 6.47235419,
7.37510395, 7.07006474, 6.61800403, 6.54731129, 6.8325027 ,
6.74527076, 7.91539102, 6.38686923, 6.68558209, 7.16035352,
6.19066998, 6.36812579, 6.10625937, 6.88634092, 5.86236157,
6.74492303, 6.18345316, 6.19151216, 6.23660546, 5.892935 ,
6.4744178 , 7.2002577 , 5.91410092, 6.21793135, 7.15854444,
6.27395635, 6.30172902, 6.49407944, 6.32600592])
y_pred_svm_rbf.min(), y_pred_svm_rbf.max()
(5.638073133700937, 9.323020641193724)
mean_squared_error(y_pred_svm_rbf, y_test)
0.6805348154672604
svr_lin.fit(X_train_rfe, y_train)
y_pred_svm_lin = svr_lin.predict(X_test_rfe)
y_pred_svm_lin
array([ 7.11401749, 6.09276738, 6.70978745, 6.7337191 , 6.25692982,
6.74986021, 6.86916927, 6.37517902, 6.14794497, 7.09123478,
5.91615225, 6.21214895, 6.14830437, 6.73500284, 6.16736974,
7.56499823, 7.08385144, 6.17406871, 6.8030164 , 6.05297964,
6.10196107, 6.32954645, 6.47383747, 6.4960798 , 5.95305706,
6.60999148, 6.76289307, 6.88317723, 7.54564818, 6.77038785,
6.60762756, 6.98954211, 6.32336002, 9.10714338, 6.14946329,
6.21252451, 6.25001553, 8.52519687, 6.56094938, 6.58836383,
6.47658496, 7.52383615, 5.99058172, 6.70376977, 6.73697422,
5.88490146, 6.25742825, 6.12669785, 6.10123985, 5.71135067,
6.56563863, 6.25751625, 5.86175162, 6.51170631, 7.37984686,
6.55032049, 6.11414923, 7.78763768, 8.38621795, 6.77983114,
7.08828201, 6.79686203, 9.62420348, 7.01356677, 8.26420167,
6.4434541 , 6.73167273, 5.75127526, 6.57171384, 7.0492827 ,
6.30606904, 6.91578289, 6.12727433, 6.25211282, 6.61088239,
6.26099298, 6.29287217, 6.36834398, 6.42977288, 6.23869443,
6.77748353, 6.33245844, 5.83990338, 6.23547911, 7.07693503,
7.05327795, 9.30006144, 6.31257264, 6.11266548, 6.1723712 ,
6.56165753, 6.82182275, 6.45091796, 6.76349254, 6.78957739,
7.41296224, 6.38186096, 7.35884563, 6.44627487, 6.83922173,
5.94204646, 6.69449664, 6.49597775, 6.34252246, 6.23404784,
6.14679007, 7.26834611, 6.48610766, 7.70565682, 6.51499408,
6.60868924, 6.79419718, 6.46205235, 5.90958649, 6.73026705,
6.69452393, 5.89935246, 7.02115955, 7.74783132, 6.275675 ,
7.76437445, 6.31738671, 5.97557479, 6.662234 , 6.33090279,
6.74069745, 7.70706671, 6.14478259, 6.15557477, 6.10971618,
6.58089575, 6.07326969, 6.2533457 , 6.7184585 , 6.56054861,
6.29704522, 5.91486705, 6.66454048, 7.16061969, 8.04717967,
6.26217826, 6.6681487 , 7.42307485, 6.32804205, 6.16346768,
6.40934351, 6.4050287 , 6.67855402, 6.41189533, 6.32870525,
6.69847944, 6.29062091, 6.71814072, 6.11785506, 6.14908759,
6.78332364, 7.14726038, 6.34162888, 6.46882707, 6.47790686,
5.82906741, 7.13682868, 7.09859935, 6.81307643, 6.62091585,
6.37887166, 6.02858529, 6.24491282, 7.61469187, 6.27112433,
6.05291227, 7.751631 , 6.40753902, 7.21756287, 6.46932019,
6.92505354, 6.65294332, 6.6224312 , 7.28251082, 6.97783888,
6.32337624, 5.63637694, 6.16246819, 6.2462247 , 6.43908212,
6.2386327 , 6.21111466, 8.01193887, 7.04558981, 6.3709234 ,
6.50978225, 6.93384276, 6.07016808, 6.49816526, 6.86387553,
7.41875553, 6.33458055, 6.67462464, 6.51879117, 6.22350967,
6.9873303 , 6.81752321, 6.47547213, 6.26037269, 7.44241296,
6.65307643, 6.34867774, 6.36300775, 6.11394352, 6.69919876,
6.05695855, 6.70201518, 6.21438017, 6.28035881, 5.70523261,
6.36985802, 6.21387324, 6.47068159, 7.33984491, 6.55027836,
6.41228312, 6.21892351, 6.38694284, 6.08157137, 6.40593901,
6.32063686, 6.39703153, 7.18996136, 6.3757408 , 7.78575183,
6.42677594, 6.68175986, 6.23389161, 6.22672902, 6.65419375,
6.5709553 , 6.58868188, 6.32612039, 7.30424093, 6.95998611,
8.96672546, 6.6851152 , 6.23184388, 6.53286468, 6.38951721,
6.46064591, 5.96091876, 6.55904594, 6.15715799, 6.27215777,
5.79925836, 6.15904847, 6.90913789, 5.76501157, 6.12391389,
6.65611903, 6.61619179, 7.15902812, 5.99179489, 6.48013498,
6.27295252, 6.37163391, 6.28236139, 6.01750146, 5.79405139,
6.58005937, 6.16731989, 6.44286023, 6.10765802, 6.8263504 ,
5.84190535, 6.6485713 , 6.25846568, 6.34023303, 6.1282801 ,
6.75373288, 5.93332345, 6.2174049 , 6.37740053, 6.67345216,
7.08790789, 6.01118021, 7.24043398, 6.04360524, 7.99100007,
6.07772389, 6.18313241, 6.34621648, 6.28724226, 6.03072176,
6.44708087, 6.37825556, 6.55118579, 6.25855876, 6.9130236 ,
6.11026429, 6.64968342, 6.52482062, 6.57720418, 6.17697843,
6.43118641, 6.65030692, 6.41331322, 5.84973512, 6.50489685,
6.92092613, 7.25031586, 6.28582297, 6.2107987 , 6.25769468,
6.48696985, 5.88109949, 1.55779249, 7.91379845, 6.11380549,
6.5966364 , 6.21935349, 7.44573573, 7.64659858, 6.04092883,
6.19155926, 6.18687601, 6.7782218 , 5.80918699, 7.93330422,
6.16527935, 6.02350663, 7.4243393 , 7.10451569, 9.27037293,
6.54103192, 6.81400102, 6.37650255, 6.10205076, 6.81062225,
5.98765336, 7.08511693, 6.4066636 , 6.38150104, 6.19649391,
6.00342257, 6.84725899, 6.50784774, 5.86983909, 7.31465428,
6.34257052, 6.90135143, 6.22310468, 6.00094504, 6.58364613,
8.26348334, 6.1603273 , 6.3311318 , 6.12867141, 6.70563132,
7.57407986, 6.93874091, 6.19359043, 6.11720509, 6.54068659,
6.70687058, 6.51972091, 6.35201397, 6.41669394, 7.2794787 ,
6.20504373, 6.88119724, 6.10991888, 7.18004702, 6.67054444,
6.22341053, 6.03252523, 7.05859151, 6.31265117, 7.14391339,
6.83774545, 5.99649221, 5.98344185, 6.45735569, 6.16784829,
6.53552734, 6.51703421, 6.29445996, 6.25749568, 6.47458511,
7.01898815, 6.36859006, 6.51219947, 6.37064838, 6.39290779,
6.10944166, 6.28348213, 8.16196122, 6.1316594 , 5.9145386 ,
6.10026473, 6.13010418, 6.11778634, 6.62011335, 8.43283267,
5.86951664, 6.7241716 , 6.50220934, 6.09319393, 6.39842211,
5.99056005, 6.67324538, 5.94047967, 6.68270981, 5.87069381,
6.71144594, 6.84379451, 5.96356297, 6.66104742, 6.52311167,
6.15858074, 6.2316598 , 6.431548 , 6.67863546, 6.27026741,
6.20743935, 6.21592427, 6.43401727, 6.05344366, 6.95621529,
6.18783679, 7.02852102, 6.90403308, 6.52362864, 6.09537905,
5.98791326, 5.85250235, 6.23212619, 6.75946381, 6.01347438,
6.29054291, 6.54323684, 7.09841178, 6.15559263, 6.4896401 ,
6.44809718, 7.90665559, 6.30542466, 6.0370571 , 5.81260165,
5.93454413, 6.29100115, 6.22738741, 6.23278953, 6.36957143,
5.96236458, 7.20047735, 6.54832944, 6.70761176, 5.90863371,
7.25166842, 6.55207812, 6.09846923, 7.18535399, 5.75421506,
7.88874631, 6.76839659, 6.6468345 , 6.39954814, 8.03392427,
6.60800522, 6.24216628, 6.47306465, 6.91729533, 6.39726166,
6.14921236, 6.11539987, 6.33007247, 6.62888223, 6.66107255,
7.25361444, 6.39805905, 6.11127068, 6.69377963, 7.33287536,
5.85288071, 6.83611518, 6.21676398, 6.77019606, 6.09411996,
6.20330967, 7.42112985, 6.49653398, 7.08279812, 5.88699129,
6.15546337, 9.55057149, 7.38782592, 5.97679269, 6.44433011,
6.56876289, 6.51246552, 7.26428171, 6.5508126 , 6.52054734,
6.71245382, 6.18059381, 6.4114302 , 6.31675051, 7.20903069,
6.12339634, 6.74333662, 6.65865955, 6.72918168, 6.70423185,
6.10530043, 6.89558706, 6.64060467, 6.31293714, 6.57343235,
6.76572889, 6.55727353, 6.97841038, 6.57648935, 6.19990538,
6.36850495, 6.44839201, 7.62832612, 6.3787371 , 5.7190496 ,
6.47316151, 7.9735769 , 6.00036012, 5.9989821 , 6.75961793,
6.58090203, 6.21261528, 6.36516559, 5.99907946, 6.56790574,
5.83134249, 5.90774515, 6.41499822, 6.06942257, 6.91974138,
6.56751513, 6.82262865, 6.71183498, 6.05202453, 7.10168499,
6.46274185, 6.40531525, 6.32279044, 6.01874914, 6.22269302,
6.6256897 , 6.20504373, 7.22366766, 6.52177254, 6.05157383,
6.61294007, 6.74195743, 6.22044972, 6.09131023, 7.09882252,
7.03956139, 6.2092467 , 6.55850101, 6.31347521, 6.15237704,
8.99170666, 6.53266771, 6.39864611, 6.47686681, 6.49020578,
6.054962 , 6.20914326, 6.21736141, 6.39145469, 6.74114251,
7.27134395, 6.08706954, 6.02590155, 6.59986072, 6.0544997 ,
6.22488975, 6.02376342, 6.30558253, 6.09807722, 6.11642343,
6.3856578 , 6.29032408, 6.60870178, 6.71710933, 6.44816344,
6.5299052 , 6.17336606, 6.85913244, 6.59991795, 6.37608644,
6.0736028 , 6.46691643, 6.70981455, 6.28075012, 6.50384131,
7.16993791, 7.97710437, 6.77105879, 6.05576201, 5.89802497,
6.53876598, 7.93974605, 6.75554859, 6.5574799 , 7.51638236,
6.17735201, 6.50122277, 6.98757029, 6.21974468, 6.23254868,
6.01774528, 6.05974286, 6.67711135, 6.43815164, 6.90026152,
6.60529331, 7.51491018, 6.28004293, 6.89126267, 7.73860541,
5.78791676, 6.36765971, 6.41850354, 6.54533046, 6.00048124,
6.50774262, 6.31926503, 6.03839661, 6.23417609, 10.1011548 ,
5.90497917, 6.72694197, 5.78220295, 7.7057908 , 6.23247667,
7.5918954 , 7.14371803, 5.97818049, 6.87801639, 6.98223823,
7.2394073 , 6.13972598, 6.46607702, 6.49840932, 6.70015525,
6.2685581 , 6.44240191, 6.76218428, 6.73102609, 6.53339506,
5.86657907, 5.85712727, 6.88702416, 6.10247286, 7.70827295,
6.45382478, 6.18855188, 6.07606587, 6.13560435, 10.30294978,
6.29470882, 5.90209596, 6.08350478, 7.63230932, 6.70484857,
9.63022515, 6.12094957, 6.26643233, 6.17443898, 6.25767367,
5.92892976, 6.09096015, 6.09902402, 7.50726711, 6.29600218,
7.00097664, 6.10817496, 6.18429471, 6.32463378, 6.69796716,
6.37851002, 6.228443 , 6.23730692, 5.88474541, 5.8715791 ,
6.07540186, 6.88833374, 6.09635401, 6.23430995, 6.29937116,
6.78890802, 6.28430197, 6.28020282, 6.45549883, 6.41738944,
6.92699572, 6.22823908, 6.83821192, 6.4514254 , 6.13960316,
6.23556794, 6.19202188, 6.43249585, 6.32030143, 6.81229528,
6.96880225, 6.28301938, 8.14355023, 5.79597718, 6.33117492,
6.62034586, 6.42710153, 6.96948342, 7.07781127, 6.04152942,
6.26586202, 6.07725802, 6.2032203 , 6.06966073, 6.19350306,
6.29402848, 7.61477042, 6.242309 , 6.81958526, 6.75626767,
6.85281891, 6.25224449, 6.75426994, 7.54826203, 6.47396891,
7.39998354, 7.14044824, 6.59217082, 6.53042313, 6.72911 ,
6.65924605, 7.80057532, 6.37843533, 6.64038683, 7.11642684,
6.21088791, 6.39348945, 6.19086852, 6.7037558 , 5.87377457,
6.72567083, 6.1590903 , 6.1803822 , 6.20109319, 5.93689428,
6.45243167, 7.11940902, 5.91837451, 6.14163465, 7.07281293,
6.26996673, 6.2670119 , 6.56209388, 6.29773825])
y_pred_svm_lin.min(), y_pred_svm_lin.max()
(1.5577924884974248, 10.302949784866557)
mean_squared_error(y_pred_svm_lin, y_test)
0.7271537216165603
svr_poly.fit(X_train_rfe, y_train)
y_pred_svm_poly = svr_poly.predict(X_test_rfe)
y_pred_svm_poly
array([6.61140938, 6.4773833 , 6.60235998, 6.47483699, 6.50607532,
6.60888589, 6.54067679, 6.55462297, 6.50361111, 6.71114908,
6.44720982, 6.46609567, 6.6202836 , 6.56078055, 6.50746971,
6.65675886, 6.62424123, 6.55035983, 6.57087123, 6.43808301,
6.44301156, 6.48597198, 6.60012308, 6.56967288, 6.56703604,
6.61469425, 6.6848414 , 6.54047188, 6.58472056, 6.68127796,
6.5839137 , 6.66098041, 6.52518901, 8.16658531, 6.44891575,
6.51520736, 6.45188039, 6.63662668, 6.52741146, 6.51701052,
6.51515787, 6.59074107, 6.4461414 , 6.70797561, 6.75929133,
6.43662025, 6.45851885, 6.45722518, 6.46173362, 6.40723848,
6.50765741, 6.49233972, 6.43583407, 6.59306773, 6.66039819,
6.49373131, 6.46272099, 6.54611998, 6.73143562, 6.60380581,
6.58396067, 6.60639259, 6.89961194, 6.61521628, 6.62120071,
6.50347357, 6.91203588, 6.43747247, 6.56143136, 6.61542901,
6.46658378, 6.91841274, 6.45483992, 6.48678091, 6.52129949,
6.47196879, 6.47514814, 6.57215749, 6.50089048, 6.45813863,
6.58092705, 6.49483549, 6.42835099, 6.49165519, 6.66972935,
6.71733406, 7.17025512, 6.50843619, 6.48166342, 6.49273584,
6.53716509, 6.65076272, 6.52422599, 6.64034195, 6.56437368,
6.68168709, 6.52442105, 6.55150612, 6.54293598, 6.54813601,
6.43650872, 6.53860887, 6.49681573, 6.48930853, 6.55736119,
6.47788413, 6.53795568, 6.51022953, 6.57724566, 6.51106278,
6.96322715, 6.58114286, 6.53093372, 6.43522012, 6.68807853,
6.59848912, 6.43996896, 6.7080719 , 6.79992874, 6.45919441,
6.52769697, 6.4741364 , 6.43882624, 6.52049083, 6.46737883,
6.52564504, 7.3708516 , 6.56221383, 6.47521854, 6.53812925,
6.55993173, 6.44910584, 6.46023058, 6.68997193, 6.54332413,
6.49751953, 6.4423544 , 6.56918497, 6.68840405, 7.09410022,
6.50706729, 6.54757056, 6.89518526, 6.45992974, 6.45978537,
6.54713251, 6.507497 , 6.53451804, 6.59153623, 6.48050357,
6.6547528 , 6.51298985, 6.59572682, 6.47345543, 6.46999745,
6.58033697, 6.52339697, 6.53548366, 6.52565792, 6.51910773,
6.40463213, 6.84974828, 6.69816261, 6.54255037, 6.56151556,
6.5322558 , 6.48452385, 6.63045686, 6.95751304, 6.49979779,
6.4684945 , 6.92809207, 6.51222366, 6.59530453, 6.48274285,
6.56529024, 6.74916681, 6.48548179, 6.5333694 , 6.55347486,
6.53139967, 6.42726539, 6.44575499, 6.45168241, 6.48507345,
6.48484577, 6.45924618, 6.89473276, 6.87506469, 6.69780432,
7.12750786, 6.53924964, 6.46894935, 6.64834347, 6.60046736,
6.77533193, 6.50802426, 6.55115286, 6.51793926, 6.52152224,
6.57053193, 6.51377702, 6.69471087, 6.44830363, 6.82095427,
6.60204418, 6.53112744, 6.47539461, 6.46671728, 6.67302109,
6.46941738, 6.57955075, 6.73743017, 6.45737172, 6.40717812,
6.50849979, 6.51583976, 6.56325076, 6.65534209, 6.51350055,
6.7329805 , 6.4617569 , 6.50210081, 6.44406716, 6.48531452,
6.5762248 , 6.52147391, 6.49832667, 6.51361335, 6.62726708,
6.65572628, 6.64879893, 6.47578595, 6.48396515, 6.49728315,
6.73579791, 6.53969074, 6.46910021, 6.6334369 , 6.57458299,
6.93274162, 6.48839481, 6.49859526, 6.51741741, 6.57047342,
6.66991717, 6.42589131, 6.58482186, 6.47507302, 6.58904868,
6.4312165 , 6.46560401, 6.53411455, 6.41625362, 6.49898603,
6.59634909, 6.62413757, 6.60133441, 6.43012028, 6.62666299,
6.49222682, 6.52069961, 6.47793696, 6.48540784, 6.43261378,
6.51366177, 6.52202053, 6.50562669, 6.45207554, 6.61164994,
6.42729464, 6.50457611, 6.46024027, 6.49176597, 6.51632913,
6.55087479, 6.44813431, 6.45032299, 6.50187507, 6.80076279,
6.87634887, 6.54149277, 6.56654102, 6.49922595, 6.81765429,
6.43350848, 6.47079293, 6.45738678, 6.64159816, 6.49653612,
6.48316535, 6.51485563, 6.55649696, 6.46651017, 6.53133901,
6.44291087, 6.58508653, 6.49306071, 6.50833272, 6.46468196,
6.5105289 , 6.52560349, 6.48831232, 6.41973611, 6.48907607,
6.4833734 , 6.55126893, 6.50265964, 6.4873327 , 6.50341307,
6.54268064, 6.44825606, 6.69899119, 7.19326526, 6.45043327,
6.54043174, 6.47966446, 6.55251639, 6.68783624, 6.4520874 ,
6.51943195, 6.44791874, 6.59888158, 6.52957345, 7.17466497,
6.47511593, 6.43878752, 6.70796214, 6.71939391, 6.54101406,
6.74206373, 6.58270544, 6.50743291, 6.4748673 , 6.53946011,
6.48374925, 6.67804885, 6.47548768, 6.4948679 , 6.46422548,
6.43192978, 6.63481179, 6.54115732, 6.45473474, 6.6471917 ,
6.48568339, 6.51563145, 6.50720363, 6.43336808, 6.55213708,
6.78595127, 6.48349529, 6.52858489, 6.52088933, 6.53618183,
6.62477419, 6.73533217, 6.7771133 , 6.46064001, 6.58150744,
6.56906698, 6.61000338, 6.47151223, 6.57770343, 6.57665069,
6.46053901, 6.54569595, 6.45718065, 6.53817053, 6.53452957,
6.52359962, 6.44612782, 6.53148736, 6.50917586, 6.74129643,
7.01041277, 6.43688403, 6.45055405, 6.59319941, 6.4709448 ,
6.57333895, 6.51769335, 6.50137462, 6.47930536, 6.54232817,
6.62222391, 6.52478352, 6.5203145 , 6.53539776, 6.52912713,
6.45544558, 6.4876149 , 7.42205897, 6.51488648, 6.44683813,
6.46654809, 6.46050823, 6.45435371, 6.50504387, 7.23126811,
6.41636927, 6.6266532 , 7.08208362, 6.45352358, 6.56027118,
6.65702786, 6.5193385 , 6.42461386, 6.58155507, 6.45448496,
6.66969438, 6.81177083, 6.50367976, 6.6579036 , 6.60000145,
6.45811412, 6.64473597, 6.4997191 , 6.80214815, 6.46792924,
6.54521788, 6.48916237, 6.56487227, 6.4420426 , 6.59101712,
6.4827025 , 6.67449279, 6.87228828, 6.56052763, 6.44643139,
6.44266101, 6.4979394 , 6.57257276, 6.56306207, 6.50087838,
6.55412328, 6.54266775, 6.7077751 , 6.50037484, 6.54637998,
6.59278775, 6.89331567, 6.48003925, 6.45306262, 6.55288138,
6.42600653, 6.50494859, 6.51903958, 6.50927313, 6.53677107,
6.43045809, 6.54205959, 6.48516378, 6.5515697 , 6.42164519,
6.55060572, 6.50011953, 6.46222315, 6.78896573, 6.46366487,
6.80597861, 6.70539595, 6.61887027, 6.50782312, 7.22421578,
6.52503229, 6.49334498, 6.58191662, 6.56174157, 6.50021563,
6.43945167, 6.44324456, 6.46259701, 6.51528249, 6.65790529,
6.48561531, 6.49007427, 6.45170311, 6.55205168, 6.52416267,
6.51209962, 6.58121858, 6.45468684, 6.73470408, 6.56690332,
6.49044529, 6.75238763, 6.60157164, 6.63346653, 6.44448961,
6.46639957, 6.78742627, 8.02926093, 6.45066804, 6.49185502,
6.54592342, 6.80528076, 6.68363706, 6.56140391, 6.47782122,
6.54504991, 6.50950708, 6.48404464, 6.52169349, 6.61855975,
6.50166948, 6.62909471, 6.5025395 , 6.57201635, 6.57870564,
6.45829297, 6.66173614, 6.52537251, 6.46706483, 6.53840152,
6.52186698, 6.57993872, 6.85336349, 6.474182 , 6.72031764,
6.5126278 , 6.51994974, 6.55943878, 6.4969262 , 6.42852678,
6.62393092, 7.02346679, 6.4448027 , 6.43492065, 6.60389975,
6.55993212, 6.5160796 , 6.58169975, 6.45502748, 6.50734893,
6.65825121, 6.47843026, 6.58735952, 6.45046905, 6.64038744,
6.6099303 , 6.53201787, 6.51998712, 6.44087678, 6.59500903,
6.52211898, 6.56775118, 6.51131922, 6.4402548 , 6.50689542,
6.50065321, 6.46053901, 6.70844154, 6.57670993, 6.50894183,
6.51890316, 6.59798697, 6.48116188, 6.49661981, 6.58626636,
6.60643184, 6.47023062, 6.53373675, 6.53471356, 6.42746627,
6.7350848 , 6.50308195, 6.51700794, 6.56928501, 6.50048428,
6.48274271, 6.49850006, 6.47227722, 6.52770435, 6.70572085,
6.81469073, 6.43044788, 6.44948677, 6.54168895, 6.4548511 ,
6.46504698, 6.40074372, 6.45729785, 6.43424565, 6.46251409,
6.53342956, 6.54260501, 6.61901931, 6.58708034, 6.5114947 ,
6.61190812, 6.43928138, 6.65743492, 6.57621984, 6.4877845 ,
6.46337922, 6.52303443, 6.61484163, 6.4607993 , 6.51614675,
6.81491269, 6.69438315, 6.55727138, 6.44491354, 6.43595276,
6.50046975, 7.40816936, 6.59863983, 6.53279165, 6.84753447,
6.5436147 , 6.56286798, 6.55119205, 6.47803206, 6.46783362,
6.4552895 , 6.45950572, 6.63396344, 6.56937254, 6.61157888,
6.58654699, 7.06694922, 6.49791729, 6.51447255, 6.79776348,
6.43578924, 6.52201485, 6.50697431, 6.57143269, 6.44606515,
6.52932363, 6.45361645, 6.59169471, 6.56507897, 6.69385462,
6.4565731 , 6.51909729, 6.44426563, 6.78931684, 6.52205392,
6.94350474, 6.63244014, 6.50076352, 6.63535184, 6.77624123,
6.82550369, 6.44894817, 6.52677123, 6.52822872, 6.51175292,
6.48308806, 6.53028152, 6.52534876, 6.63003158, 6.50325199,
6.4593927 , 6.4275962 , 6.62452748, 6.50652951, 6.56101003,
6.49186321, 6.47945548, 6.45344667, 6.4545689 , 7.79804019,
6.61991761, 6.42261287, 6.49554589, 6.57175743, 6.55299554,
6.92924763, 6.48853023, 6.4655736 , 6.52832766, 6.46067752,
6.43804257, 6.47589584, 6.43495153, 6.56053722, 6.4640192 ,
6.6807539 , 6.43528203, 6.46164626, 6.69923133, 6.54347069,
6.52810691, 6.45431385, 6.47975106, 6.4514629 , 6.44845905,
6.44267713, 6.58011779, 6.44802703, 6.44985307, 6.52708058,
6.50368982, 6.47173961, 6.48566964, 6.5430928 , 6.62204717,
6.58441958, 6.4728628 , 6.59180815, 6.53514623, 6.48322943,
6.5835415 , 6.45080327, 6.48879159, 6.46017866, 6.58296057,
6.5618972 , 6.45071101, 7.5862532 , 6.48677788, 6.47032774,
6.58822206, 6.50368554, 6.53767388, 6.57956053, 6.47915762,
6.52447003, 6.44362154, 6.49736236, 6.45364351, 6.49547872,
6.54359335, 6.658892 , 6.47330154, 6.55445612, 6.65910487,
6.5466105 , 6.47760572, 6.60787603, 6.77175771, 6.52714487,
6.75853794, 6.62504099, 6.58806928, 6.49793115, 6.64695094,
6.60674893, 7.28246348, 6.49795369, 6.53594601, 6.83939456,
6.49223699, 6.51183845, 6.43419481, 6.82100518, 6.42311616,
6.54464137, 6.46364298, 6.47710913, 6.50402414, 6.41926248,
6.49162677, 6.79420747, 6.42993129, 6.51054093, 6.65892136,
6.47670914, 6.48054862, 6.47698626, 6.47897685])
y_pred_svm_poly.min(), y_pred_svm_poly.max()
(6.40074371770222, 8.166585313922411)
mean_squared_error(y_pred_svm_poly, y_test)
0.9394577327410445
from sklearn import ensemble
n_trees=200
gradientboost = ensemble.GradientBoostingRegressor(loss='ls',learning_rate=0.03,n_estimators=n_trees,max_depth=4)
gradientboost.fit(X_train_rfe,y_train)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.03, loss='ls', max_depth=4,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=200,
n_iter_no_change=None, presort='auto',
random_state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
y_pred_gb=gradientboost.predict(X_test_rfe)
error=gradientboost.loss_(y_test,y_pred_gb) ##Loss function== Mean square error
print("MSE:%.3f" % error)
MSE:0.444
mean_squared_error(y_pred_gb, y_test)
0.4442879667702143
y_pred_gb.min(), y_pred_gb.max()
(4.249968514984295, 8.620023088931495)
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'loss' : ['ls'],
'max_depth' : [3, 4, 5],
'learning_rate' : [0.01, 0.001],
'n_estimators': [100, 200, 500]
}
# Create a based model
gb = ensemble.GradientBoostingRegressor()
# Instantiate the grid search model
grid_search_gb = GridSearchCV(estimator = gb, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
grid_search_gb.fit(X_train_rfe, y_train)
grid_search_gb.best_params_
Fitting 3 folds for each of 18 candidates, totalling 54 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 26.2s [Parallel(n_jobs=-1)]: Done 54 out of 54 | elapsed: 40.8s finished
{'learning_rate': 0.01, 'loss': 'ls', 'max_depth': 5, 'n_estimators': 500}
grid_search_gb_pred = grid_search_gb.predict(X_test_rfe)
mean_squared_error(y_test.values, grid_search_gb_pred)
0.42984876460201366
from sklearn.ensemble import RandomForestRegressor
rf_regressor = RandomForestRegressor(n_estimators = 500)
rf_regressor.fit(X_train_rfe, y_train)
rf_pred = rf_regressor.predict(X_test_rfe)
mean_squared_error(rf_pred, y_test)
0.4608096143979058
Lets tweek in to the hyperparameter tuning of the RandomForestRegressor to find the best parameters of the model
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_depth': [90, 100],
'max_features': [2, 3],
'min_samples_leaf': [3, 4],
'min_samples_split': [8, 10],
'n_estimators': [100, 500, 1000]
}
# Create a based model
rf = RandomForestRegressor()
# Instantiate the grid search model
grid_search_rf = GridSearchCV(estimator = rf, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
grid_search_rf.fit(X_train_rfe, y_train)
grid_search_rf.best_params_
Fitting 3 folds for each of 48 candidates, totalling 144 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 48.8s [Parallel(n_jobs=-1)]: Done 144 out of 144 | elapsed: 3.6min finished
{'bootstrap': True,
'max_depth': 100,
'max_features': 3,
'min_samples_leaf': 3,
'min_samples_split': 8,
'n_estimators': 500}
y_grid_pred_rf = grid_search_rf.predict(X_test_rfe)
mean_squared_error(y_grid_pred_rf, y_test.values)
0.45261327593335443
import xgboost as xgb
xg_model = xgb.XGBRegressor(n_estimators = 500)
xg_model.fit(X_train_rfe, y_train)
[02:28:18] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
importance_type='gain', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None, n_estimators=500,
n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
results = xg_model.predict(X_test_rfe)
mean_squared_error(results, y_test.values)
0.40658616285849325
xg_model.score(X_train_rfe, y_train)
0.8473678944169633
from sklearn.metrics import r2_score
r2_score(y_test, results)
0.6076545422983535
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'max_depth': [3, 4],
'learning_rate' : [0.1, 0.01, 0.05],
'n_estimators' : [100, 500, 1000]
}
# Create a based model
model_xgb= xgb.XGBRegressor()
# Instantiate the grid search model
grid_search_xgb = GridSearchCV(estimator = model_xgb, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
grid_search_xgb.fit(X_train_rfe, y_train)
grid_search_xgb.best_params_
Fitting 3 folds for each of 18 candidates, totalling 54 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 35.3s [Parallel(n_jobs=-1)]: Done 54 out of 54 | elapsed: 53.3s finished
[02:29:13] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
{'learning_rate': 0.05, 'max_depth': 4, 'n_estimators': 500}
y_pred_xgb = grid_search_xgb.predict(X_test_rfe)
mean_squared_error(y_test.values, y_pred_xgb)
0.40457361889369015
Considering XG Boost as a final model with very less error rate.
feature_importance = grid_search_xgb.best_estimator_.feature_importances_
sorted_importance = np.argsort(feature_importance)
pos = np.arange(len(sorted_importance))
plt.figure(figsize=(12,5))
plt.barh(pos, feature_importance[sorted_importance],align='center')
plt.yticks(pos, X_train_rfe.columns[sorted_importance],fontsize=15)
plt.title('Feature Importance ',fontsize=18)
plt.show()
After looking in to all the metrics almost we have seen that XGBRegressor with "{'learning_rate': 0.05, 'max_depth': 4, 'n_estimators': 500}" these parameters has given the best results with mean squared error of 0.404. The Feature Importance given by this model is shown above.
datasetC.head()
| color | num_critic_for_reviews | duration | director_facebook_likes | actor_3_facebook_likes | actor_1_facebook_likes | gross | num_voted_users | cast_total_facebook_likes | facenumber_in_poster | ... | imdb_score | aspect_ratio | movie_facebook_likes | director_name_value_counts | actor_2_name_value_counts | main_genre | genres_value_counts | actor_1_name_value_counts | actor_3_name_value_counts | main_plot_keyword_value_counts | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 723.0 | 178.0 | 0.0 | 855.0 | 1000.0 | 760505847.0 | 886204 | 4834 | 0.0 | ... | 7.9 | 1.78 | 33000 | 7 | 3 | 0 | 12 | 4 | 3 | 2 |
| 1 | 1 | 302.0 | 169.0 | 563.0 | 1000.0 | 40000.0 | 309404152.0 | 471220 | 48350 | 0.0 | ... | 7.1 | 2.35 | 0 | 7 | 7 | 0 | 25 | 38 | 4 | 1 |
| 2 | 1 | 602.0 | 148.0 | 0.0 | 161.0 | 11000.0 | 200074175.0 | 275868 | 11700 | 1.0 | ... | 6.8 | 2.35 | 85000 | 8 | 2 | 0 | 45 | 4 | 1 | 7 |
| 3 | 1 | 813.0 | 164.0 | 22000.0 | 23000.0 | 27000.0 | 448130642.0 | 1144337 | 106759 | 0.0 | ... | 8.5 | 2.35 | 164000 | 8 | 5 | 0 | 22 | 9 | 2 | 2 |
| 4 | 1 | 462.0 | 132.0 | 475.0 | 530.0 | 640.0 | 73058679.0 | 212204 | 1873 | 1.0 | ... | 6.6 | 2.35 | 24000 | 3 | 3 | 0 | 46 | 2 | 1 | 69 |
5 rows × 27 columns
To Build a classification Model I would like to reuse the preprocessed data from the Regression Model. However I am going to replace the target variable and create a new target variable for our classification Model.
| imdb_score | Classify |
|---|---|
1-3 | Flop Movie 3-6 | Average Movie 6-10 | Hit Movie
y_train_classification = y_train.copy()
y_train_classification = pd.cut(y_train_classification, bins=[1, 3, 6, float('Inf')], labels=['Flop Movie', 'Average Movie', 'Hit Movie'])
y_test_classification = y_test.copy()
y_test_classification = pd.cut(y_test_classification, bins=[1, 3, 6, float('Inf')], labels=['Flop Movie', 'Average Movie', 'Hit Movie'])
We have created the target variable and now we will re use the independent variables form the Regression Model.
X_train_rfe_classification = X_train_rfe.copy()
X_test_rfe_classification = X_test_rfe.copy()
Logistic Regresion is a linear algorithm does basically a binary classification. In order to use the Logistic Regression for Multiclass Classification we need to use the parameter solver as 'saga'. There are also other parameters for solver to do multiclass classification, I used saga as it also does L2 regularisation.
from sklearn.linear_model import LogisticRegression
logit_model = LogisticRegression(solver = 'saga', random_state = 0)
logit_model.fit(X_train_rfe_classification, y_train_classification)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=0, solver='saga', tol=0.0001, verbose=0,
warm_start=False)
y_logit_pred = logit_model.predict(X_test_rfe_classification)
y_logit_pred
array(['Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie'], dtype=object)
from sklearn import metrics
count_misclassified = (y_test_classification != y_logit_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))
accuracy = metrics.accuracy_score(y_test_classification, y_logit_pred)
print('Accuracy: {:.2f}'.format(accuracy))
precision = metrics.precision_score(y_test_classification, y_logit_pred, average= 'macro')
print('Precision: {:.2f}'.format(precision))
recall = metrics.recall_score(y_test_classification, y_logit_pred, average= 'macro')
print('Recall: {:.2f}'.format(recall))
f1_score = metrics.f1_score(y_test_classification, y_logit_pred, average = 'macro')
print('F1 score: {:.2f}'.format(f1_score))
Misclassified samples: 190 Accuracy: 0.75 Precision: 0.47 Recall: 0.40 F1 score: 0.41
Support Vector Classifier also basically does binary classification. In order to achieve the multi classification, we need to use the decision_function_shape as 'ovo'. The original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2)
from sklearn.svm import SVC
svc_linear_model = SVC(kernel='linear', C=100, gamma= 'scale', decision_function_shape='ovo', random_state = 42)
svc_linear_model.fit(X_train_rfe_classification, y_train_classification)
y_svc_linear_pred = svc_linear_model.predict(X_test_rfe_classification)
y_svc_linear_pred
array(['Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie'], dtype=object)
from sklearn import metrics
count_misclassified = (y_test_classification != y_svc_linear_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))
accuracy = metrics.accuracy_score(y_test_classification, y_svc_linear_pred)
print('Accuracy: {:.2f}'.format(accuracy))
precision = metrics.precision_score(y_test_classification, y_svc_linear_pred, average= 'macro')
print('Precision: {:.2f}'.format(precision))
recall = metrics.recall_score(y_test_classification, y_svc_linear_pred, average= 'macro')
print('Recall: {:.2f}'.format(recall))
f1_score = metrics.f1_score(y_test_classification, y_svc_linear_pred, average = 'macro')
print('F1 score: {:.2f}'.format(f1_score))
Misclassified samples: 181 Accuracy: 0.76 Precision: 0.47 Recall: 0.45 F1 score: 0.46
from sklearn.svm import SVC
svc_poly_model = SVC(kernel='poly', C=100, gamma= 'scale', degree = 3, decision_function_shape='ovo', random_state = 42)
svc_poly_model.fit(X_train_rfe_classification, y_train_classification)
y_svc_poly_pred = svc_poly_model.predict(X_test_rfe_classification)
y_svc_poly_pred
array(['Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie'], dtype=object)
from sklearn import metrics
count_misclassified = (y_test_classification != y_svc_poly_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))
accuracy = metrics.accuracy_score(y_test_classification, y_svc_poly_pred)
print('Accuracy: {:.2f}'.format(accuracy))
precision = metrics.precision_score(y_test_classification, y_svc_poly_pred, average= 'macro')
print('Precision: {:.2f}'.format(precision))
recall = metrics.recall_score(y_test_classification, y_svc_poly_pred, average= 'macro')
print('Recall: {:.2f}'.format(recall))
f1_score = metrics.f1_score(y_test_classification, y_svc_poly_pred, average = 'macro')
print('F1 score: {:.2f}'.format(f1_score))
Misclassified samples: 143 Accuracy: 0.81 Precision: 0.52 Recall: 0.50 F1 score: 0.51
from sklearn.svm import SVC
svc_rbf_model = SVC(kernel='rbf', C=100, gamma= 'scale', decision_function_shape='ovo', random_state = 42)
svc_rbf_model.fit(X_train_rfe_classification, y_train_classification)
y_svc_rbf_pred = svc_rbf_model.predict(X_test_rfe_classification)
y_svc_rbf_pred
array(['Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie'], dtype=object)
from sklearn import metrics
count_misclassified = (y_test_classification != y_svc_rbf_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))
accuracy = metrics.accuracy_score(y_test_classification, y_svc_rbf_pred)
print('Accuracy: {:.2f}'.format(accuracy))
precision = metrics.precision_score(y_test_classification, y_svc_rbf_pred, average= 'macro')
print('Precision: {:.2f}'.format(precision))
recall = metrics.recall_score(y_test_classification, y_svc_rbf_pred, average= 'macro')
print('Recall: {:.2f}'.format(recall))
f1_score = metrics.f1_score(y_test_classification, y_svc_rbf_pred, average = 'macro')
print('F1 score: {:.2f}'.format(f1_score))
Misclassified samples: 146 Accuracy: 0.81 Precision: 0.51 Recall: 0.50 F1 score: 0.50
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_depth': [90, 100],
'max_features': [2, 3],
'min_samples_leaf': [3, 4],
'min_samples_split': [8, 10],
'n_estimators': [100, 500, 1000],
'random_state' :[0]
}
# Create a based model
rf_model_classification = RandomForestClassifier()
# Instantiate the grid search model
grid_search_rf_model_classificaiton = GridSearchCV(estimator = rf_model_classification, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
grid_search_rf_model_classificaiton.fit(X_train_rfe_classification, y_train_classification)
Fitting 3 folds for each of 48 candidates, totalling 144 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 53.4s [Parallel(n_jobs=-1)]: Done 144 out of 144 | elapsed: 3.9min finished
GridSearchCV(cv=3, error_score='raise-deprecating',
estimator=RandomForestClassifier(bootstrap=True, class_weight=None,
criterion='gini', max_depth=None,
max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators='warn', n_jobs=None,
oob_score=False,
random_state=None, verbose=0,
warm_start=False),
iid='warn', n_jobs=-1,
param_grid={'bootstrap': [True], 'max_depth': [90, 100],
'max_features': [2, 3], 'min_samples_leaf': [3, 4],
'min_samples_split': [8, 10],
'n_estimators': [100, 500, 1000],
'random_state': [0]},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=2)
y_rf_classification_pred = grid_search_rf_model_classificaiton.predict(X_test_rfe_classification)
y_rf_classification_pred
array(['Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie'],
dtype=object)
from sklearn import metrics
count_misclassified = (y_test_classification != y_rf_classification_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))
accuracy = metrics.accuracy_score(y_test_classification, y_rf_classification_pred)
print('Accuracy: {:.2f}'.format(accuracy))
precision = metrics.precision_score(y_test_classification, y_rf_classification_pred, average= 'macro')
print('Precision: {:.2f}'.format(precision))
recall = metrics.recall_score(y_test_classification, y_rf_classification_pred, average= 'macro')
print('Recall: {:.2f}'.format(recall))
f1_score = metrics.f1_score(y_test_classification, y_rf_classification_pred, average = 'macro')
print('F1 score: {:.2f}'.format(f1_score))
Misclassified samples: 130 Accuracy: 0.83 Precision: 0.54 Recall: 0.50 F1 score: 0.51
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'max_depth': [10, 50, 90],
'max_features': [3],
'min_samples_leaf': [3],
'min_samples_split': [8, 10],
'n_estimators': [100, 500],
'learning_rate' : [0.1, 0.2],
'random_state' : [0]
}
# Create a based model
gbc_model_classification = GradientBoostingClassifier()
# Instantiate the grid search model
grid_search_gbc_model_classificaiton = GridSearchCV(estimator = gbc_model_classification, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
grid_search_gbc_model_classificaiton.fit(X_train_rfe_classification, y_train_classification)
Fitting 3 folds for each of 24 candidates, totalling 72 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 2.2min [Parallel(n_jobs=-1)]: Done 72 out of 72 | elapsed: 3.6min finished
GridSearchCV(cv=3, error_score='raise-deprecating',
estimator=GradientBoostingClassifier(criterion='friedman_mse',
init=None, learning_rate=0.1,
loss='deviance', max_depth=3,
max_features=None,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators=100,
n_iter_no...
subsample=1.0, tol=0.0001,
validation_fraction=0.1,
verbose=0, warm_start=False),
iid='warn', n_jobs=-1,
param_grid={'learning_rate': [0.1, 0.2], 'max_depth': [10, 50, 90],
'max_features': [3], 'min_samples_leaf': [3],
'min_samples_split': [8, 10],
'n_estimators': [100, 500], 'random_state': [0]},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=2)
y_gbc_model_pred = grid_search_gbc_model_classificaiton.predict(X_test_rfe_classification)
y_gbc_model_pred
array(['Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie'],
dtype=object)
from sklearn import metrics
count_misclassified = (y_test_classification != y_gbc_model_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))
accuracy = metrics.accuracy_score(y_test_classification, y_gbc_model_pred)
print('Accuracy: {:.2f}'.format(accuracy))
precision = metrics.precision_score(y_test_classification, y_gbc_model_pred, average= 'macro')
print('Precision: {:.2f}'.format(precision))
recall = metrics.recall_score(y_test_classification, y_gbc_model_pred, average= 'macro')
print('Recall: {:.2f}'.format(recall))
f1_score = metrics.f1_score(y_test_classification, y_gbc_model_pred, average = 'macro')
print('F1 score: {:.2f}'.format(f1_score))
Misclassified samples: 123 Accuracy: 0.84 Precision: 0.54 Recall: 0.52 F1 score: 0.53
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
param_grid = {
'objective' : ['multi:softmax', 'multi:softprob'],
'n_estimators': [100, 500, 1000],
'random_state': [0]
}
# Create a based model
xgb_model_classification = XGBClassifier()
# Instantiate the grid search model
grid_search_xgb_model_classificaiton = GridSearchCV(estimator = xgb_model_classification, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
grid_search_xgb_model_classificaiton.fit(X_train_rfe_classification, y_train_classification)
Fitting 3 folds for each of 6 candidates, totalling 18 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 18 out of 18 | elapsed: 36.3s finished
GridSearchCV(cv=3, error_score='raise-deprecating',
estimator=XGBClassifier(base_score=0.5, booster='gbtree',
colsample_bylevel=1, colsample_bynode=1,
colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1,
missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='binary:logistic',
random_state=0, reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=None, silent=None,
subsample=1, verbosity=1),
iid='warn', n_jobs=-1,
param_grid={'n_estimators': [100, 500, 1000],
'objective': ['multi:softmax', 'multi:softprob'],
'random_state': [0]},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=2)
y_xgb_classification_pred = grid_search_xgb_model_classificaiton.predict(X_test_rfe_classification)
y_xgb_classification_pred
array(['Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Average Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Hit Movie',
'Hit Movie', 'Hit Movie', 'Average Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie', 'Hit Movie',
'Average Movie', 'Hit Movie', 'Hit Movie', 'Average Movie',
'Hit Movie', 'Average Movie', 'Hit Movie'], dtype=object)
from sklearn import metrics
count_misclassified = (y_test_classification != y_xgb_classification_pred).sum()
print('Misclassified samples: {}'.format(count_misclassified))
accuracy = metrics.accuracy_score(y_test_classification, y_xgb_classification_pred)
print('Accuracy: {:.2f}'.format(accuracy))
precision = metrics.precision_score(y_test_classification, y_xgb_classification_pred, average= 'macro')
print('Precision: {:.2f}'.format(precision))
recall = metrics.recall_score(y_test_classification, y_xgb_classification_pred, average= 'macro')
print('Recall: {:.2f}'.format(recall))
f1_score = metrics.f1_score(y_test_classification, y_xgb_classification_pred, average = 'macro')
print('F1 score: {:.2f}'.format(f1_score))
Misclassified samples: 139 Accuracy: 0.82 Precision: 0.52 Recall: 0.51 F1 score: 0.51
As we see that the Gradient Boost with Hyper Parameter seems to give us the best Results. This is because the nature of Ensemble models tend to being overfitted. However we consider the final model for our classification as Gradient Boosting Classifier.
Considering Gradient Boosting classifier as the final model with 83 % accuracy
feature_importance = grid_search_gbc_model_classificaiton.best_estimator_.feature_importances_
sorted_importance = np.argsort(feature_importance)
pos = np.arange(len(sorted_importance))
plt.figure(figsize=(12,5))
plt.barh(pos, feature_importance[sorted_importance],align='center')
plt.yticks(pos, X_train_rfe.columns[sorted_importance],fontsize=15)
plt.title('Feature Importance ',fontsize=18)
plt.show()
After Looking in to the feature importance of the best models in the Regression and Classification Model we see that both the models have given almost the same amount of importance to the respective features, considering XGBosot Regressor and Gradient Boost Classiifier. The results of all Regression and Classification Models are as follows:
| Regression Model | Mean_squared_error |
|---|---|
| Simple Linear Regression | 0.70 |
| SVRegressor Linear | 0.72 |
| SVRegressor Polynomial | 0.93 |
| SVRegressor RBF | 0.68 |
| Gradient Boost | 0.43 |
| Random Forest | 0.45 |
| XGBoost | 0.40 |
| Classification Model | MisClassifications | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| Logistic Regression | 190 | 0.75 | 0.47 | 0.40 | 0.41 |
| SVC Linear | 181 | 0.76 | 0.47 | 0.45 | 0.46 |
| SVC Polynomial | 143 | 0.81 | 0.52 | 0.50 | 0.51 |
| SVC RBF | 146 | 0.81 | 0.51 | 0.50 | 0.50 |
| Random Forest | 130 | 0.83 | 0.54 | 0.50 | 0.51 |
| Gradient Boosting | 127 | 0.83 | 0.54 | 0.51 | 0.52 |
| XGBoost | 139 | 0.82 | 0.52 | 0.51 | 0.51 |